This improves its performance, both in time (always) and memory (generally).
…Fixes #4557
## The algorithm
The algorithm is basically the one that [Ruby](https://github.com/ruby/ruby/blob/master/st.c#L1-L101
) uses.
## How to review this
There's just a single commit. My recommendation is to first look at the final code, which is thoroughly documented (the important bits are at the top), then at the diff.
I tried to document this really well because otherwise in a month I won't remember any of this. It also enables anyone to understand how it works and possibly keep optimizing things.
## Why?
The old Hash implementation uses closed addressing: buckets with linked lists. This apparently isn't great because of pointer chasing and not having good cache locality. Using open addressing removes pointer chasing and improves cache locality. It could just be a theory but empirically it performs much better.
## Give me some benchmarks!
I used this code as a benchmark:
<details>
<summary>Code for benchmarking old Hash vs. new Hash</summary>
```crystal
require "benchmark"
require "./old_hash"
require "random/secure"
def benchmark_create_empty
Benchmark.ips do |x|
x.report("old, create") do
OldHash(Int32, Int32).new
end
x.report("new, create") do
Hash(String, String).new
end
end
end
def benchmark_insert_strings(sizes)
sizes.each do |size|
values = Array.new(size) { Random::Secure.hex }
Benchmark.ips do |x|
x.report("old, insert (type: String, size: #{size})") do
hash = OldHash(String, String).new
values.each do |value|
hash[value] = value
end
end
x.report("new, insert (type: String, size: #{size})") do
hash = Hash(String, String).new
values.each do |value|
hash[value] = value
end
end
end
end
end
def benchmark_insert_ints(sizes)
sizes.each do |size|
values = Array.new(size) { rand(Int32) }
Benchmark.ips do |x|
x.report("old, insert (type: Int32, size: #{size})") do
hash = OldHash(Int32, Int32).new
values.each do |value|
hash[value] = value
end
end
x.report("new, insert (type: Int32, size: #{size})") do
hash = Hash(Int32, Int32).new
values.each do |value|
hash[value] = value
end
end
end
end
end
def benchmark_read_strings(sizes)
sizes.each do |size|
values = Array.new(size) { Random::Secure.hex }
old_hash = OldHash(String, String).new
new_hash = Hash(String, String).new
values.each do |value|
old_hash[value] = value
new_hash[value] = value
end
Benchmark.ips do |x|
x.report("old, read (type: String, size: #{size})") do
values.each do |value|
old_hash[value]
end
end
x.report("new, read (type: String, size: #{size})") do
values.each do |value|
new_hash[value]
end
end
end
end
end
def benchmark_read_ints(sizes)
sizes.each do |size|
values = Array.new(size) { rand(1_00_000) }
old_hash = OldHash(Int32, Int32).new
new_hash = Hash(Int32, Int32).new
values.each do |value|
old_hash[value] = value
new_hash[value] = value
end
Benchmark.ips do |x|
x.report("old, read (type: Int32, size: #{size})") do
values.each do |value|
old_hash[value]
end
end
x.report("new, read (type: Int32, size: #{size})") do
values.each do |value|
new_hash[value]
end
end
end
end
end
sizes = [5, 10, 15, 20, 30, 50, 100, 200, 500, 1_000, 10_000, 100_000]
benchmark_create_empty()
puts
benchmark_insert_strings(sizes)
puts
benchmark_insert_ints(sizes)
puts
benchmark_read_strings(sizes)
puts
benchmark_read_ints(sizes)
```
</details>
Results:
```
old, create 18.50M ( 54.06ns) (± 1.82%) 160B/op 2.34× slower
new, create 43.32M ( 23.08ns) (± 0.88%) 64.0B/op fastest
old, insert (type: String, size: 5) 3.60M (277.58ns) (± 7.01%) 480B/op 1.38× slower
new, insert (type: String, size: 5) 4.97M (201.28ns) (± 4.77%) 384B/op fastest
old, insert (type: String, size: 10) 1.95M (513.74ns) (± 1.64%) 801B/op 1.33× slower
new, insert (type: String, size: 10) 2.59M (386.02ns) (± 3.01%) 832B/op fastest
old, insert (type: String, size: 15) 1.27M (786.17ns) (± 3.56%) 1.09kB/op 1.55× slower
new, insert (type: String, size: 15) 1.98M (505.71ns) (± 4.48%) 832B/op fastest
old, insert (type: String, size: 20) 922.03k ( 1.08µs) (± 0.99%) 1.41kB/op 1.37× slower
new, insert (type: String, size: 20) 1.27M (788.95ns) (± 2.86%) 1.67kB/op fastest
old, insert (type: String, size: 30) 639.38k ( 1.56µs) (± 5.63%) 2.03kB/op 1.58× slower
new, insert (type: String, size: 30) 1.01M (992.38ns) (± 6.01%) 1.67kB/op fastest
old, insert (type: String, size: 50) 355.49k ( 2.81µs) (± 5.77%) 3.28kB/op 1.49× slower
new, insert (type: String, size: 50) 530.85k ( 1.88µs) (± 3.95%) 3.81kB/op fastest
old, insert (type: String, size: 100) 154.93k ( 6.45µs) (± 4.04%) 7.06kB/op 1.81× slower
new, insert (type: String, size: 100) 280.47k ( 3.57µs) (± 3.92%) 7.09kB/op fastest
old, insert (type: String, size: 200) 71.71k ( 13.94µs) (± 3.27%) 13.3kB/op 1.91× slower
new, insert (type: String, size: 200) 136.71k ( 7.31µs) (± 2.21%) 14.5kB/op fastest
old, insert (type: String, size: 500) 23.46k ( 42.62µs) (± 3.53%) 36.1kB/op 2.46× slower
new, insert (type: String, size: 500) 57.82k ( 17.30µs) (± 4.78%) 28.4kB/op fastest
old, insert (type: String, size: 1000) 12.78k ( 78.27µs) (± 1.72%) 67.4kB/op 2.20× slower
new, insert (type: String, size: 1000) 28.12k ( 35.56µs) (± 4.20%) 56.4kB/op fastest
old, insert (type: String, size: 10000) 1.00k (997.78µs) (± 6.41%) 662kB/op 2.12× slower
new, insert (type: String, size: 10000) 2.13k (469.91µs) (± 5.39%) 896kB/op fastest
old, insert (type: Int32, size: 5) 4.83M (207.19ns) (± 0.80%) 400B/op 1.78× slower
new, insert (type: Int32, size: 5) 8.60M (116.23ns) (± 1.76%) 240B/op fastest
old, insert (type: Int32, size: 10) 2.78M (359.30ns) (± 0.91%) 640B/op 1.72× slower
new, insert (type: Int32, size: 10) 4.79M (208.73ns) (± 1.11%) 448B/op fastest
old, insert (type: Int32, size: 15) 1.90M (525.28ns) (± 2.96%) 880B/op 1.75× slower
new, insert (type: Int32, size: 15) 3.33M (300.60ns) (± 5.98%) 448B/op fastest
old, insert (type: Int32, size: 20) 1.30M (769.02ns) (± 6.52%) 1.09kB/op 1.54× slower
new, insert (type: Int32, size: 20) 2.00M (500.54ns) (± 4.76%) 976B/op fastest
old, insert (type: Int32, size: 30) 972.78k ( 1.03µs) (± 4.94%) 1.56kB/op 1.72× slower
new, insert (type: Int32, size: 30) 1.67M (597.26ns) (± 4.29%) 976B/op fastest
old, insert (type: Int32, size: 50) 582.06k ( 1.72µs) (± 5.23%) 2.5kB/op 1.50× slower
new, insert (type: Int32, size: 50) 872.34k ( 1.15µs) (± 7.97%) 1.88kB/op fastest
old, insert (type: Int32, size: 100) 234.48k ( 4.26µs) (± 7.09%) 5.5kB/op 2.08× slower
new, insert (type: Int32, size: 100) 488.20k ( 2.05µs) (± 6.79%) 4.14kB/op fastest
old, insert (type: Int32, size: 200) 126.98k ( 7.88µs) (± 4.74%) 10.2kB/op 1.84× slower
new, insert (type: Int32, size: 200) 233.78k ( 4.28µs) (± 7.45%) 8.47kB/op fastest
old, insert (type: Int32, size: 500) 40.86k ( 24.47µs) (± 4.14%) 28.3kB/op 2.55× slower
new, insert (type: Int32, size: 500) 104.34k ( 9.58µs) (± 7.55%) 16.5kB/op fastest
old, insert (type: Int32, size: 1000) 17.87k ( 55.97µs) (± 3.24%) 51.8kB/op 2.57× slower
new, insert (type: Int32, size: 1000) 45.98k ( 21.75µs) (± 7.72%) 32.5kB/op fastest
old, insert (type: Int32, size: 10000) 1.52k (658.91µs) (± 3.88%) 506kB/op 2.16× slower
new, insert (type: Int32, size: 10000) 3.28k (305.04µs) (± 4.09%) 513kB/op fastest
old, read (type: String, size: 5) 11.64M ( 85.90ns) (± 6.51%) 0.0B/op 2.25× slower
new, read (type: String, size: 5) 26.20M ( 38.16ns) (± 2.98%) 0.0B/op fastest
old, read (type: String, size: 10) 5.69M (175.80ns) (± 4.64%) 0.0B/op 1.39× slower
new, read (type: String, size: 10) 7.93M (126.07ns) (± 6.97%) 0.0B/op fastest
old, read (type: String, size: 15) 4.15M (240.69ns) (± 2.01%) 0.0B/op 1.16× slower
new, read (type: String, size: 15) 4.82M (207.47ns) (± 3.30%) 0.0B/op fastest
old, read (type: String, size: 20) 2.93M (341.71ns) (± 1.47%) 0.0B/op 1.52× slower
new, read (type: String, size: 20) 4.46M (224.21ns) (± 1.25%) 0.0B/op fastest
old, read (type: String, size: 30) 1.87M (534.24ns) (± 2.90%) 0.0B/op 1.45× slower
new, read (type: String, size: 30) 2.71M (369.30ns) (± 1.90%) 0.0B/op fastest
old, read (type: String, size: 50) 992.08k ( 1.01µs) (± 4.51%) 0.0B/op 1.71× slower
new, read (type: String, size: 50) 1.70M (589.77ns) (± 1.65%) 0.0B/op fastest
old, read (type: String, size: 100) 596.10k ( 1.68µs) (± 1.60%) 0.0B/op 1.40× slower
new, read (type: String, size: 100) 832.91k ( 1.20µs) (± 1.82%) 0.0B/op fastest
old, read (type: String, size: 200) 260.30k ( 3.84µs) (± 1.63%) 0.0B/op 1.53× slower
new, read (type: String, size: 200) 398.84k ( 2.51µs) (± 1.84%) 0.0B/op fastest
old, read (type: String, size: 500) 111.03k ( 9.01µs) (± 2.89%) 0.0B/op 1.40× slower
new, read (type: String, size: 500) 155.71k ( 6.42µs) (± 3.33%) 0.0B/op fastest
old, read (type: String, size: 1000) 41.59k ( 24.04µs) (± 3.07%) 0.0B/op 1.71× slower
new, read (type: String, size: 1000) 71.27k ( 14.03µs) (± 3.81%) 0.0B/op fastest
old, read (type: String, size: 10000) 2.66k (376.32µs) (± 4.55%) 0.0B/op 2.46× slower
new, read (type: String, size: 10000) 6.54k (152.95µs) (± 4.03%) 0.0B/op fastest
old, read (type: Int32, size: 5) 23.67M ( 42.24ns) (± 1.64%) 0.0B/op 2.30× slower
new, read (type: Int32, size: 5) 54.35M ( 18.40ns) (± 3.14%) 0.0B/op fastest
old, read (type: Int32, size: 10) 11.66M ( 85.77ns) (± 4.63%) 0.0B/op 1.85× slower
new, read (type: Int32, size: 10) 21.58M ( 46.33ns) (± 3.20%) 0.0B/op fastest
old, read (type: Int32, size: 15) 7.81M (128.00ns) (± 4.15%) 0.0B/op 1.49× slower
new, read (type: Int32, size: 15) 11.63M ( 86.00ns) (± 3.99%) 0.0B/op fastest
old, read (type: Int32, size: 20) 5.80M (172.48ns) (± 5.34%) 0.0B/op 1.73× slower
new, read (type: Int32, size: 20) 10.02M ( 99.77ns) (± 4.54%) 0.0B/op fastest
old, read (type: Int32, size: 30) 3.67M (272.51ns) (± 2.99%) 0.0B/op 1.84× slower
new, read (type: Int32, size: 30) 6.74M (148.30ns) (± 1.97%) 0.0B/op fastest
old, read (type: Int32, size: 50) 2.05M (488.04ns) (± 5.57%) 0.0B/op 1.89× slower
new, read (type: Int32, size: 50) 3.87M (258.41ns) (± 4.24%) 0.0B/op fastest
old, read (type: Int32, size: 100) 1.09M (921.59ns) (± 8.72%) 0.0B/op 1.70× slower
new, read (type: Int32, size: 100) 1.84M (542.80ns) (± 5.52%) 0.0B/op fastest
old, read (type: Int32, size: 200) 535.83k ( 1.87µs) (± 5.84%) 0.0B/op 1.66× slower
new, read (type: Int32, size: 200) 891.31k ( 1.12µs) (± 5.49%) 0.0B/op fastest
old, read (type: Int32, size: 500) 236.68k ( 4.23µs) (± 3.61%) 0.0B/op 1.52× slower
new, read (type: Int32, size: 500) 360.85k ( 2.77µs) (± 4.31%) 0.0B/op fastest
old, read (type: Int32, size: 1000) 106.13k ( 9.42µs) (± 4.88%) 0.0B/op 1.66× slower
new, read (type: Int32, size: 1000) 175.92k ( 5.68µs) (± 4.08%) 0.0B/op fastest
old, read (type: Int32, size: 10000) 4.60k (217.62µs) (± 2.94%) 0.0B/op 3.00× slower
new, read (type: Int32, size: 10000) 13.80k ( 72.47µs) (± 1.67%) 0.0B/op fastest
```
As you can see the new implementation is always faster than the old one. Sometimes more memory is used, sometimes less.
I also ran some @kostya [benchmarks](https://github.com/kostya/benchmarks) that used Hash in their implementation. Here are the results:
#### [Havlak](https://github.com/kostya/benchmarks/blob/master/havlak/havlak.cr):
```
old: 12.49s, 375.1Mb
new: 7.58s, 215.7Mb
```
Havlak seems to be a benchmark measuring how well a language performs in general algorithmic tasks... the new results look good! 😊
#### [Brainfuck](https://github.com/kostya/benchmarks/blob/master/brainfuck/brainfuck.cr):
```
old: 5.20s, 1.8Mb
new: 4.22s, 1.8Mb
```
#### [JSON](https://github.com/kostya/benchmarks/blob/master/json/test.cr) (when using `JSON.parse`):
```
old: 2.20s, 1137.0Mb
new: 2.07s, 961.3Mb
```
#### [Knucleotide](https://github.com/kostya/crystal-benchmarks-game/blob/master/knucleotide/knucleotide.cr):
```
old: 1.63s, 26.5Mb
new: 1.01s, 32.4Mb
```
Then some more benchmarks...
There's `HTTP::Request#from_io` which I [recently optimized](https://github.com/crystal-lang/crystal/pull/8002):
```
old: from_io 498.47k ( 2.01µs) (± 0.89%) 816B/op fastest
new: from_io 549.22k ( 1.82µs) (± 1.89%) 720B/op fastest
```
(using `wrk` against the sample http server increases the requests/sec from 118355.73 to about 122000)
Also for curiosity I compared creating a Hash with 1_000_000 elements and seeing how it compares to Ruby and the old Hash.
<details>
<summary>Code for creating a Hash with 1_000_000 elements in Ruby and Crystal</summary>
```crystal
size = 1_000_000
h = {0 => 0}
time = Time.now
size.times do |i|
h[i] = i
end
puts "Insert: #{Time.now - time}"
time = Time.now
size.times do |i|
h[i]
end
puts "Read: #{Time.now - time}"
```
</details>
Results:
```
Ruby 2.7-dev:
Insert: 0.151813
Read: 0.13749
Crystal old:
Insert: 0.238662
Read: 0.129462
Crystal new:
Insert: 0.070804
Read: 0.041008
```
Ruby was faster than Crystal! Ruby is simply amazing ❤️. But now Crystal is even faster!
## The compiler uses Hash all over the place!
So compile times now should go down! Right? ... Right?
Well, unfortunately no. I think the main reason is that the times are bound by the number of method instances, not by the performance of the many hashes used.
Memory consumption did seem to go down a bit.
## When will I have this?
~We could push this to 0.31.0. Or... we could have it in 0.30.0 if we delay the release a bit more (note: I don't manage releases, but if the community doesn't mind waiting a bit to get a huge performance boost in their apps then I think we could relax the release date).~
In 0.31.0.
## Final thoughts
1. Thank you Ruby! ❤️ ❤️ ❤️
1. Thank you Vladimir Makarov! ❤️ ❤️ ❤️
1. Algorithms are cool!
1. There's an optimization in the original Ruby algorithm which uses bitmasks instead of `remainder` or `%` to fit a number inside a range... I did that as almost the last thing because I didn't believe it would improve performance a lot... and it doubled the performance! 😮
1. Feel free to target this PR and benchmark your code and post any interesting speedups here!
1. I hope CI 32 bits passes! 😊