Anyone know how to stream a large file and get a Digest
from it? since we cant File.read
a file like over 2GB. I went through the docs but didn’t see a way to pass a IO
to Digest::Base
There was a limit to files contents being under a 32bit number but I dont know if that is still a limit. What error are you getting?
i tried to do it via:
require "digest/sha1"
require "digest/md5"
large_file = "3gb.file"
file = File.open(large_file)
slice = Bytes.new(256000000)
digest = Digest::SHA1.base64digest do |ctx|
file.read(slice)
while (slice)
ctx.update slice
file.read(slice)
end
end
puts digest.to_slice.hexstring
But it throws the same Arithmetic error i get when doing File.read
of the 3gb file
The error in particular is:
Unhandled exception: Arithmetic overflow (OverflowError)
from ../../../../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/digest/sha1.cr:38:19 in 'update_impl'
from ../../../../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/digest/base.cr:107:5 in 'update'
from ../../../../../../../../../play:19:9 in '__crystal_main'
from ../../../../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/crystal/main.cr:105:5 in 'main_user_code'
from ../../../../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/crystal/main.cr:91:7 in 'main'
from ../../../../../../../../../usr/local/Cellar/crystal/0.35.1_1/src/crystal/main.cr:114:3 in 'main'
Looks like files are now 64 bit capable
But slices are not.
It’s a bug in the sha1 code, it should be using wrapping arithmetic. Could you please report it? Thank you!
This line:
Should be:
@length_low &+= 8
And similarly for the += 1
a bit below that.
@asterite perfect. wanted to confirm it was a bug before opening.
A current work around is to use OpenSSL
File.open(large_file) do |f|
slice = Bytes.new(256_000)
io = OpenSSL::DigestIO.new(f, "MD5")
while (io.read(slice)) > 0; end
puts io.hexdigest
end
@kalinon Not related to your original inquiry, but you might want to check out Blake3 if you’re going to be hashing large files regularly. @Didactic.Drunk has an implementation for Crystal. https://github.com/didactic-drunk/blake3.cr