struct Foo
def initialize(@foo : String)
end
end
pp Foo.new("bar").hash
I get a different Int64 result. How to get a result consistent across runs? I guess I could use OpenSSL::SHA1, but couldn’t make it work due to missing #string method…
In my case, I don’t need cryptographic security, I just want to get an unique value which would change whenever a Foo's instance variable value changes, consistent across runs. With SHA1 digest I get 1.13M IPS with this code:
benchmark code
require "openssl"
struct Foo
def initialize(@foo : String)
end
end
foo = Foo.new("bar")
digest = OpenSSL::Digest.new("SHA1")
io = IO::Memory.new
require "benchmark"
Benchmark.ips do |x|
x.report do
io.rewind
foo.inspect(io)
digest.update(io)
hash = digest.hexdigest
end
end
I’d expect sha-1 to be pretty fast. It claims sha-1 is 909 MiB per second. Maybe there’s some overhead in using openSSL, or at least in the way crystal calls to openSSL?
You might want to pick up the minimal C-source for one of the faster digests, and see what happens if you try to call that directly from crystal. (or rewrite it in crystal, if you’re more ambitious! ). It happens that I wanted to try out blake2b yesterday, so I know a nice small repository for that source is at:
One nice thing about blake2 is that you can select what size you want for the digest values, from 1 to 64 bytes.
The dumb little C-test program that I wrote yesterday is able to process 2,397 files (with a total of 37,169,518 bytes) in 0.21 seconds. There are two versions of blake2: blake2s for CPU’s with 32-bit ints, and blake2b for CPU’s with 64-bit ints. My simple program took advantage of blake2b.
In your benchmark above, I don’t think it’s SHA1 which is slow, but turning the digest into hexstring:
require "openssl"
struct Foo
def initialize(@foo : String)
end
end
foo = Foo.new("bar")
digest = OpenSSL::Digest.new("SHA1")
io = IO::Memory.new
require "benchmark"
Benchmark.ips do |x|
x.report "SHA1 no-hexdigest" do
io.rewind
foo.inspect(io)
digest.update(io)
end
x.report "SHA1 hexdigest" do
io.rewind
foo.inspect(io)
digest.update(io)
hash = digest.hexdigest
end
x.report "nohash" do
io.rewind
foo.inspect(io)
end
end
It works and it’s consistent and fast, but the result will differ on different machines due to little-big-endians thing. And it’s not suitable for my purposes, unfortunately…
Because a JSON serialization result could be of arbitrary length, and I’m about to store it in Redis, thus trying to optimize things as much as possible
I also managed to try https://github.com/ysbaddaden/siphash.cr and it’s working better than FNV on big strings. I’ve made a comparison of these three: OpenSSL::SHA1, FNV and SipHash on 10, 100, 500 and 10k-sized strings. I’ve made my conclusions and I’m gonna give the developer a choice on what algorithm to use for a job.
Benchmark code
require "openssl"
require "./fnv1a"
require "siphash"
require "benchmark"
require "random/secure"
payload_10 = ("a" * 10).to_slice
payload_100 = ("a" * 100).to_slice
payload_500 = ("a" * 500).to_slice
payload_10000 = ("a" * 10000).to_slice
sha_1digest = OpenSSL::Digest.new("SHA1")
siphash_key = StaticArray(UInt8, 16).new(0)
Benchmark.ips do |x|
x.report "sha1 10" do
hash = sha_1digest.update(payload_10).to_s
end
x.report "sha1 100" do
hash = sha_1digest.update(payload_100).to_s
end
x.report "sha1 500" do
hash = sha_1digest.update(payload_500).to_s
end
x.report "sha1 10000" do
hash = sha_1digest.update(payload_10000).to_s
end
x.report "fnv1a 10" do
hash = FNV.fnv1a_32(payload_10).to_s
end
x.report "fnv1a 100" do
hash = FNV.fnv1a_32(payload_100).to_s
end
x.report "fnv1a 500" do
hash = FNV.fnv1a_32(payload_500).to_s
end
x.report "fnv1a 10000" do
hash = FNV.fnv1a_32(payload_10000).to_s
end
x.report "siphash 10" do
hash = SipHash(1, 3).siphash(payload_10, siphash_key).to_s
end
x.report "siphash 100" do
hash = SipHash(1, 3).siphash(payload_100, siphash_key).to_s
end
x.report "siphash 500" do
hash = SipHash(1, 3).siphash(payload_500, siphash_key).to_s
end
x.report "siphash 10000" do
hash = SipHash(1, 3).siphash(payload_10000, siphash_key).to_s
end
end
Crystal doesn’t run on any big-endian architectures currently, and it probably never will. Big-endian architectures are rare these days. I wouldn’t worry about it and use Crystal::Hasher.