Does omitting types in method parameters affect compiler speed?

girng_github · October 12, 2019, 3:05am

I’ve always been curious about this!

For example:

# NOT SPECIFYING TYPES IN PARAMETERS
#class Client
#  property username : String
#  property id : Int32
#  
#  def initialize(username, id)
#    @username = username
#    @id = id
#  end
#end

# SPECIFYING TYPES IN PARAMETERS
class Client
  property username : String
  property id : Int32
  
  def initialize(username : String, id : Int32)
    @username = username
    @id = id
  end
end


p = Client.new("Muffin", 9000)

pp! p

Both variations of Client compile without issue. And I don’t think just ONE class would affect the compiling process, however… it got me thinking; does it affect it at a large scale? Let’s say you have 50, 70 or 100+ methods that don’t specify the types in parameters. Does it make a difference?

vlazar · October 12, 2019, 5:52am

When you have a question about speed, the best thing to do is ask a computer.

No matter how smart you think you are, you will be just making assumptions of how fast this piece of code is or will x be faster than y.

Sure, you can use Big-O notation to some extent but the best thing is still to observe and make conclusions. This is how science works.

So make your experiments and let those theories meet the reality. Learn from it!

See for example this cool optimization done by Ary for Hashes which is landed in Crystal 0.31.0

github.com/crystal-lang/crystal

Hash now uses an open addressing algorithm

crystal-lang:master ← asterite:open-addressing-hash

opened 10:30PM - 30 Jul 19 UTC

asterite

+1138 -230

This improves its performance, both in time (always) and memory (generally). …Fixes #4557 ## The algorithm The algorithm is basically the one that [Ruby](https://github.com/ruby/ruby/blob/master/st.c#L1-L101 ) uses. ## How to review this There's just a single commit. My recommendation is to first look at the final code, which is thoroughly documented (the important bits are at the top), then at the diff. I tried to document this really well because otherwise in a month I won't remember any of this. It also enables anyone to understand how it works and possibly keep optimizing things. ## Why? The old Hash implementation uses closed addressing: buckets with linked lists. This apparently isn't great because of pointer chasing and not having good cache locality. Using open addressing removes pointer chasing and improves cache locality. It could just be a theory but empirically it performs much better. ## Give me some benchmarks! I used this code as a benchmark: <details> <summary>Code for benchmarking old Hash vs. new Hash</summary> ```crystal require "benchmark" require "./old_hash" require "random/secure" def benchmark_create_empty Benchmark.ips do |x| x.report("old, create") do OldHash(Int32, Int32).new end x.report("new, create") do Hash(String, String).new end end end def benchmark_insert_strings(sizes) sizes.each do |size| values = Array.new(size) { Random::Secure.hex } Benchmark.ips do |x| x.report("old, insert (type: String, size: #{size})") do hash = OldHash(String, String).new values.each do |value| hash[value] = value end end x.report("new, insert (type: String, size: #{size})") do hash = Hash(String, String).new values.each do |value| hash[value] = value end end end end end def benchmark_insert_ints(sizes) sizes.each do |size| values = Array.new(size) { rand(Int32) } Benchmark.ips do |x| x.report("old, insert (type: Int32, size: #{size})") do hash = OldHash(Int32, Int32).new values.each do |value| hash[value] = value end end x.report("new, insert (type: Int32, size: #{size})") do hash = Hash(Int32, Int32).new values.each do |value| hash[value] = value end end end end end def benchmark_read_strings(sizes) sizes.each do |size| values = Array.new(size) { Random::Secure.hex } old_hash = OldHash(String, String).new new_hash = Hash(String, String).new values.each do |value| old_hash[value] = value new_hash[value] = value end Benchmark.ips do |x| x.report("old, read (type: String, size: #{size})") do values.each do |value| old_hash[value] end end x.report("new, read (type: String, size: #{size})") do values.each do |value| new_hash[value] end end end end end def benchmark_read_ints(sizes) sizes.each do |size| values = Array.new(size) { rand(1_00_000) } old_hash = OldHash(Int32, Int32).new new_hash = Hash(Int32, Int32).new values.each do |value| old_hash[value] = value new_hash[value] = value end Benchmark.ips do |x| x.report("old, read (type: Int32, size: #{size})") do values.each do |value| old_hash[value] end end x.report("new, read (type: Int32, size: #{size})") do values.each do |value| new_hash[value] end end end end end sizes = [5, 10, 15, 20, 30, 50, 100, 200, 500, 1_000, 10_000, 100_000] benchmark_create_empty() puts benchmark_insert_strings(sizes) puts benchmark_insert_ints(sizes) puts benchmark_read_strings(sizes) puts benchmark_read_ints(sizes) ``` </details> Results: ``` old, create 18.50M ( 54.06ns) (± 1.82%) 160B/op 2.34× slower new, create 43.32M ( 23.08ns) (± 0.88%) 64.0B/op fastest old, insert (type: String, size: 5) 3.60M (277.58ns) (± 7.01%) 480B/op 1.38× slower new, insert (type: String, size: 5) 4.97M (201.28ns) (± 4.77%) 384B/op fastest old, insert (type: String, size: 10) 1.95M (513.74ns) (± 1.64%) 801B/op 1.33× slower new, insert (type: String, size: 10) 2.59M (386.02ns) (± 3.01%) 832B/op fastest old, insert (type: String, size: 15) 1.27M (786.17ns) (± 3.56%) 1.09kB/op 1.55× slower new, insert (type: String, size: 15) 1.98M (505.71ns) (± 4.48%) 832B/op fastest old, insert (type: String, size: 20) 922.03k ( 1.08µs) (± 0.99%) 1.41kB/op 1.37× slower new, insert (type: String, size: 20) 1.27M (788.95ns) (± 2.86%) 1.67kB/op fastest old, insert (type: String, size: 30) 639.38k ( 1.56µs) (± 5.63%) 2.03kB/op 1.58× slower new, insert (type: String, size: 30) 1.01M (992.38ns) (± 6.01%) 1.67kB/op fastest old, insert (type: String, size: 50) 355.49k ( 2.81µs) (± 5.77%) 3.28kB/op 1.49× slower new, insert (type: String, size: 50) 530.85k ( 1.88µs) (± 3.95%) 3.81kB/op fastest old, insert (type: String, size: 100) 154.93k ( 6.45µs) (± 4.04%) 7.06kB/op 1.81× slower new, insert (type: String, size: 100) 280.47k ( 3.57µs) (± 3.92%) 7.09kB/op fastest old, insert (type: String, size: 200) 71.71k ( 13.94µs) (± 3.27%) 13.3kB/op 1.91× slower new, insert (type: String, size: 200) 136.71k ( 7.31µs) (± 2.21%) 14.5kB/op fastest old, insert (type: String, size: 500) 23.46k ( 42.62µs) (± 3.53%) 36.1kB/op 2.46× slower new, insert (type: String, size: 500) 57.82k ( 17.30µs) (± 4.78%) 28.4kB/op fastest old, insert (type: String, size: 1000) 12.78k ( 78.27µs) (± 1.72%) 67.4kB/op 2.20× slower new, insert (type: String, size: 1000) 28.12k ( 35.56µs) (± 4.20%) 56.4kB/op fastest old, insert (type: String, size: 10000) 1.00k (997.78µs) (± 6.41%) 662kB/op 2.12× slower new, insert (type: String, size: 10000) 2.13k (469.91µs) (± 5.39%) 896kB/op fastest old, insert (type: Int32, size: 5) 4.83M (207.19ns) (± 0.80%) 400B/op 1.78× slower new, insert (type: Int32, size: 5) 8.60M (116.23ns) (± 1.76%) 240B/op fastest old, insert (type: Int32, size: 10) 2.78M (359.30ns) (± 0.91%) 640B/op 1.72× slower new, insert (type: Int32, size: 10) 4.79M (208.73ns) (± 1.11%) 448B/op fastest old, insert (type: Int32, size: 15) 1.90M (525.28ns) (± 2.96%) 880B/op 1.75× slower new, insert (type: Int32, size: 15) 3.33M (300.60ns) (± 5.98%) 448B/op fastest old, insert (type: Int32, size: 20) 1.30M (769.02ns) (± 6.52%) 1.09kB/op 1.54× slower new, insert (type: Int32, size: 20) 2.00M (500.54ns) (± 4.76%) 976B/op fastest old, insert (type: Int32, size: 30) 972.78k ( 1.03µs) (± 4.94%) 1.56kB/op 1.72× slower new, insert (type: Int32, size: 30) 1.67M (597.26ns) (± 4.29%) 976B/op fastest old, insert (type: Int32, size: 50) 582.06k ( 1.72µs) (± 5.23%) 2.5kB/op 1.50× slower new, insert (type: Int32, size: 50) 872.34k ( 1.15µs) (± 7.97%) 1.88kB/op fastest old, insert (type: Int32, size: 100) 234.48k ( 4.26µs) (± 7.09%) 5.5kB/op 2.08× slower new, insert (type: Int32, size: 100) 488.20k ( 2.05µs) (± 6.79%) 4.14kB/op fastest old, insert (type: Int32, size: 200) 126.98k ( 7.88µs) (± 4.74%) 10.2kB/op 1.84× slower new, insert (type: Int32, size: 200) 233.78k ( 4.28µs) (± 7.45%) 8.47kB/op fastest old, insert (type: Int32, size: 500) 40.86k ( 24.47µs) (± 4.14%) 28.3kB/op 2.55× slower new, insert (type: Int32, size: 500) 104.34k ( 9.58µs) (± 7.55%) 16.5kB/op fastest old, insert (type: Int32, size: 1000) 17.87k ( 55.97µs) (± 3.24%) 51.8kB/op 2.57× slower new, insert (type: Int32, size: 1000) 45.98k ( 21.75µs) (± 7.72%) 32.5kB/op fastest old, insert (type: Int32, size: 10000) 1.52k (658.91µs) (± 3.88%) 506kB/op 2.16× slower new, insert (type: Int32, size: 10000) 3.28k (305.04µs) (± 4.09%) 513kB/op fastest old, read (type: String, size: 5) 11.64M ( 85.90ns) (± 6.51%) 0.0B/op 2.25× slower new, read (type: String, size: 5) 26.20M ( 38.16ns) (± 2.98%) 0.0B/op fastest old, read (type: String, size: 10) 5.69M (175.80ns) (± 4.64%) 0.0B/op 1.39× slower new, read (type: String, size: 10) 7.93M (126.07ns) (± 6.97%) 0.0B/op fastest old, read (type: String, size: 15) 4.15M (240.69ns) (± 2.01%) 0.0B/op 1.16× slower new, read (type: String, size: 15) 4.82M (207.47ns) (± 3.30%) 0.0B/op fastest old, read (type: String, size: 20) 2.93M (341.71ns) (± 1.47%) 0.0B/op 1.52× slower new, read (type: String, size: 20) 4.46M (224.21ns) (± 1.25%) 0.0B/op fastest old, read (type: String, size: 30) 1.87M (534.24ns) (± 2.90%) 0.0B/op 1.45× slower new, read (type: String, size: 30) 2.71M (369.30ns) (± 1.90%) 0.0B/op fastest old, read (type: String, size: 50) 992.08k ( 1.01µs) (± 4.51%) 0.0B/op 1.71× slower new, read (type: String, size: 50) 1.70M (589.77ns) (± 1.65%) 0.0B/op fastest old, read (type: String, size: 100) 596.10k ( 1.68µs) (± 1.60%) 0.0B/op 1.40× slower new, read (type: String, size: 100) 832.91k ( 1.20µs) (± 1.82%) 0.0B/op fastest old, read (type: String, size: 200) 260.30k ( 3.84µs) (± 1.63%) 0.0B/op 1.53× slower new, read (type: String, size: 200) 398.84k ( 2.51µs) (± 1.84%) 0.0B/op fastest old, read (type: String, size: 500) 111.03k ( 9.01µs) (± 2.89%) 0.0B/op 1.40× slower new, read (type: String, size: 500) 155.71k ( 6.42µs) (± 3.33%) 0.0B/op fastest old, read (type: String, size: 1000) 41.59k ( 24.04µs) (± 3.07%) 0.0B/op 1.71× slower new, read (type: String, size: 1000) 71.27k ( 14.03µs) (± 3.81%) 0.0B/op fastest old, read (type: String, size: 10000) 2.66k (376.32µs) (± 4.55%) 0.0B/op 2.46× slower new, read (type: String, size: 10000) 6.54k (152.95µs) (± 4.03%) 0.0B/op fastest old, read (type: Int32, size: 5) 23.67M ( 42.24ns) (± 1.64%) 0.0B/op 2.30× slower new, read (type: Int32, size: 5) 54.35M ( 18.40ns) (± 3.14%) 0.0B/op fastest old, read (type: Int32, size: 10) 11.66M ( 85.77ns) (± 4.63%) 0.0B/op 1.85× slower new, read (type: Int32, size: 10) 21.58M ( 46.33ns) (± 3.20%) 0.0B/op fastest old, read (type: Int32, size: 15) 7.81M (128.00ns) (± 4.15%) 0.0B/op 1.49× slower new, read (type: Int32, size: 15) 11.63M ( 86.00ns) (± 3.99%) 0.0B/op fastest old, read (type: Int32, size: 20) 5.80M (172.48ns) (± 5.34%) 0.0B/op 1.73× slower new, read (type: Int32, size: 20) 10.02M ( 99.77ns) (± 4.54%) 0.0B/op fastest old, read (type: Int32, size: 30) 3.67M (272.51ns) (± 2.99%) 0.0B/op 1.84× slower new, read (type: Int32, size: 30) 6.74M (148.30ns) (± 1.97%) 0.0B/op fastest old, read (type: Int32, size: 50) 2.05M (488.04ns) (± 5.57%) 0.0B/op 1.89× slower new, read (type: Int32, size: 50) 3.87M (258.41ns) (± 4.24%) 0.0B/op fastest old, read (type: Int32, size: 100) 1.09M (921.59ns) (± 8.72%) 0.0B/op 1.70× slower new, read (type: Int32, size: 100) 1.84M (542.80ns) (± 5.52%) 0.0B/op fastest old, read (type: Int32, size: 200) 535.83k ( 1.87µs) (± 5.84%) 0.0B/op 1.66× slower new, read (type: Int32, size: 200) 891.31k ( 1.12µs) (± 5.49%) 0.0B/op fastest old, read (type: Int32, size: 500) 236.68k ( 4.23µs) (± 3.61%) 0.0B/op 1.52× slower new, read (type: Int32, size: 500) 360.85k ( 2.77µs) (± 4.31%) 0.0B/op fastest old, read (type: Int32, size: 1000) 106.13k ( 9.42µs) (± 4.88%) 0.0B/op 1.66× slower new, read (type: Int32, size: 1000) 175.92k ( 5.68µs) (± 4.08%) 0.0B/op fastest old, read (type: Int32, size: 10000) 4.60k (217.62µs) (± 2.94%) 0.0B/op 3.00× slower new, read (type: Int32, size: 10000) 13.80k ( 72.47µs) (± 1.67%) 0.0B/op fastest ``` As you can see the new implementation is always faster than the old one. Sometimes more memory is used, sometimes less. I also ran some @kostya [benchmarks](https://github.com/kostya/benchmarks) that used Hash in their implementation. Here are the results: #### [Havlak](https://github.com/kostya/benchmarks/blob/master/havlak/havlak.cr): ``` old: 12.49s, 375.1Mb new: 7.58s, 215.7Mb ``` Havlak seems to be a benchmark measuring how well a language performs in general algorithmic tasks... the new results look good! 😊 #### [Brainfuck](https://github.com/kostya/benchmarks/blob/master/brainfuck/brainfuck.cr): ``` old: 5.20s, 1.8Mb new: 4.22s, 1.8Mb ``` #### [JSON](https://github.com/kostya/benchmarks/blob/master/json/test.cr) (when using `JSON.parse`): ``` old: 2.20s, 1137.0Mb new: 2.07s, 961.3Mb ``` #### [Knucleotide](https://github.com/kostya/crystal-benchmarks-game/blob/master/knucleotide/knucleotide.cr): ``` old: 1.63s, 26.5Mb new: 1.01s, 32.4Mb ``` Then some more benchmarks... There's `HTTP::Request#from_io` which I [recently optimized](https://github.com/crystal-lang/crystal/pull/8002): ``` old: from_io 498.47k ( 2.01µs) (± 0.89%) 816B/op fastest new: from_io 549.22k ( 1.82µs) (± 1.89%) 720B/op fastest ``` (using `wrk` against the sample http server increases the requests/sec from 118355.73 to about 122000) Also for curiosity I compared creating a Hash with 1_000_000 elements and seeing how it compares to Ruby and the old Hash. <details> <summary>Code for creating a Hash with 1_000_000 elements in Ruby and Crystal</summary> ```crystal size = 1_000_000 h = {0 => 0} time = Time.now size.times do |i| h[i] = i end puts "Insert: #{Time.now - time}" time = Time.now size.times do |i| h[i] end puts "Read: #{Time.now - time}" ``` </details> Results: ``` Ruby 2.7-dev: Insert: 0.151813 Read: 0.13749 Crystal old: Insert: 0.238662 Read: 0.129462 Crystal new: Insert: 0.070804 Read: 0.041008 ``` Ruby was faster than Crystal! Ruby is simply amazing ❤️. But now Crystal is even faster! ## The compiler uses Hash all over the place! So compile times now should go down! Right? ... Right? Well, unfortunately no. I think the main reason is that the times are bound by the number of method instances, not by the performance of the many hashes used. Memory consumption did seem to go down a bit. ## When will I have this? ~We could push this to 0.31.0. Or... we could have it in 0.30.0 if we delay the release a bit more (note: I don't manage releases, but if the community doesn't mind waiting a bit to get a huge performance boost in their apps then I think we could relax the release date).~ In 0.31.0. ## Final thoughts 1. Thank you Ruby! ❤️ ❤️ ❤️ 1. Thank you Vladimir Makarov! ❤️ ❤️ ❤️ 1. Algorithms are cool! 1. There's an optimization in the original Ruby algorithm which uses bitmasks instead of `remainder` or `%` to fit a number inside a range... I did that as almost the last thing because I didn't believe it would improve performance a lot... and it doubled the performance! 😮 1. Feel free to target this PR and benchmark your code and post any interesting speedups here! 1. I hope CI 32 bits passes! 😊

I found this part very interesting:

The compiler uses Hash all over the place!

So compile times now should go down! Right? … Right?

Well, unfortunately no. I think the main reason is that the times are bound by the number of method instances, not by the performance of the many hashes used.

Memory consumption did seem to go down a bit.

Here is assumptions about faster Hash implementation providing better compile time due to excessive usage of hashes by compiler met the reality.

The lesson learned.

So in your particular case

It is very easy to make a small program and generate tons of source files
Then do a benchmark. This will be harder as benchmarks are not so easy to do right. You have to exclude other factors which may affect results.
And then make your conclusion. This one could also be hard.

vlazar · October 12, 2019, 1:48pm

I’ve remembered reading this article a while ago which is fascinating:

https://wiki.mozilla.org/User:Jorend/Deterministic_hash_tables

A deterministic hash table proposed by Tyler Close was implemented in C++ and its performance was compared to two hash table implementations using open addressing. Speed and memory usage were measured.

Here is the cool part:

Speed. The Close table implementation was very fast, faster than the open addressing implementations. (It is unclear why; theory suggests it “should” be slower, and measurement confirms that the Close table is doing more memory accesses and more branches. More investigation is needed!)

So just benchmark your code. This is the way to go.

girng_github · October 12, 2019, 4:38pm

Hi @vlazar, thanks for the info!

If I use Benchmark.ips to compare those two class Clients would that work? Or would I need a “compiler benchmark” tool?

vlazar · October 12, 2019, 5:04pm

Since your question was about compilation speed and Benchmark.ips is a runtime thing, you can’t put classes inside Banchmark.ips and expect it to somehow measure compile speed.

The thing inside Benchmark.ips needs to be executed. So yeah, some kind of tool would help you avoid measuring compilation speed manually.

You might wanna take a look at https://github.com/manastech/benchy by Brian.

asterite · October 12, 2019, 5:27pm

My guess is that compile time difference will be negligible. Ommiting types should be slightly faster but not noticeable.

girng_github · October 12, 2019, 9:18pm

Alright sweet. Thanks for the input on this matter. I don’t feel as curious as before.

Topic		Replies	Views
Build with --release performance is slow than the 2017 crystal version? Help & Support	16	645	June 13, 2022
Very slow build speeds for hello world Help & Support	42	918	August 18, 2024
Are inferred types "frozen" for crystal shards and the standard library? Crystal Contrib	11	741	December 4, 2019
Ongoing experiment: ident pool Community	13	566	July 21, 2022
Propose: Incremental compilation Crystal Contrib	33	829	January 15, 2024

Does omitting types in method parameters affect compiler speed?

The compiler uses Hash all over the place!

Related topics