How to use benchmark correctly?

erdian718 · September 14, 2023, 9:01am

I found that the compiler optimized so well that some calls were optimized directly into empty statements. A simple example:

require "benchmark"
require "math"

Benchmark.ips do |x|
  x.report("Example A") { Math.cos(1) }
  x.report("Example B") { }
end

benchmarking without the --release flag:

Example A  54.12M ( 18.48ns) (± 0.44%)  0.0B/op   8.70× slower
Example B 471.05M (  2.12ns) (± 5.06%)  0.0B/op        fastest

This results are intuitive, but Crystal prompts me to use --release flag, and the results:

Example A 793.13M (  1.26ns) (± 1.37%)  0.0B/op        fastest
Example B 791.37M (  1.26ns) (± 2.15%)  0.0B/op   1.00× slower

This is clearly not the desired outcome.
How to use the benchmark correctly?

paulocoghi · September 14, 2023, 11:27am

I can’t see the problem here. In my projects, I only use --release builds to run benchmarks, since I want to check the best option for a production scenario.

Maybe I’m wrong? I’m curious to see the other responses.

jwoertink · September 14, 2023, 3:01pm

For me, when I benchmark, I always run them in at least 1000 iterations to allow the benchmark some warmup time.

require "benchmark"
require "math"

Benchmark.ips do |x|
  x.report("Example A") do
    1000.times do
      Math.cos(1)
    end
  end
  x.report("Example B") do
    1000.times do
      1
    end
  end
end

I’m not familiar with any of the Math functions, so it’s a bit odd to me that these would still be equal. Though, I do get that testing against a static value like that isn’t very helpful, I’d still expect a static value to always be the faster option.

❯ ./bench 
Example A 773.99M (  1.29ns) (± 0.61%)  0.0B/op        fastest            
Example B 772.76M (  1.29ns) (± 1.06%)  0.0B/op   1.00× slower

Blacksmoke16 · September 14, 2023, 3:16pm

I don’t think this is actually needed. Benchmark.ips already does a 2s warmup period before starting to measure. Having the benchmark iterate 1000 times additionally is just artificially inflating the metrics by the same amount so I don’t think it would really make a difference.

What is the desired outcome? As it stands it’s showing you there is essentially no performance hit when using Math.cos as LLVM is able to optimize it all away in this context. Probably because its hardcoded scalar value vs something only known at runtime. This is more useful than the incorrect non-release version that makes you think there is.

jwoertink · September 14, 2023, 3:30pm

That’s cool! I didn’t know that.

Blacksmoke16 · September 14, 2023, 3:30pm

It’s also customizable: Benchmark - Crystal 1.9.2

asterite · September 14, 2023, 3:34pm

It is! The compiler will compute Math.cos(1) at compile-time, replacing it with a constant. No work done in that place, then!

So the benchmark result is actually telling you that the compiler is optimizing this.

To avoid this, try passing the argument of Math.cos as a runtime value.

require "benchmark"
require "math"

value = ARGV[0].to_f

Benchmark.ips do |x|
  x.report("Example A") { Math.cos(value) }
  x.report("Example B") { }
end

Then you run it like this:

crystal run foo.cr --release -- 0.87234

and the result for me is:

Example A 896.49M (  1.12ns) (± 1.42%)  0.0B/op   1.01× slower
Example B 908.05M (  1.10ns) (± 1.41%)  0.0B/op        fastest

So almost as fast as doing nothing, but still slightly slower than doing nothing.

HertzDevil · September 14, 2023, 4:01pm

If you inspect the emitted LLVM IR, you can see that both blocks compile to nothing since Math.cos has no side effects. You must write:

require "benchmark"

x = 1
y = 0.0

Benchmark.ips do |b|
  b.report("Example A") { y = Math.cos(x) }
  b.report("Example B") { }
end

The x is to ensure the argument forms a closure and cannot be optimized away by LLVM, the y is to ensure the return value cannot be optimized away. On my machine this gives:

Example A  65.08M ( 15.37ns) (± 3.94%)  0.0B/op  13.59× slower
Example B 884.50M (  1.13ns) (± 8.33%)  0.0B/op        fastest

erdian718 · September 15, 2023, 1:00am

Considering that we don’t want the compiler to do this kind of optimization when doing benchmarking, is it possible to achieve this effect in the benchmark standard library without requiring the user to write such code with little tricks himself?

Blacksmoke16 · September 15, 2023, 1:50am

I’m not sure I follow, why wouldn’t you want benchmarks to include compiler optimizations? And what do you mean by “without requiring the user to write such code with little tricks himself”?

Being able to write more readable code that is ultimately as efficient as lower level/less readable code is a big win. Especially when the user doesn’t need to think about it.

EDIT: NVM, I think you’re talking about this specific context where the values are all known at compile time vs in the real code would be runtime values.

erdian718 · September 15, 2023, 2:44am

For example, suppose there are two methods: method1 and method2.
We want the compilation to optimize these two methods as fast as possible, and rightly so.
Then we wrote the following benchmark to find out which implementation performed better:

Benchmark.ips do |b|
  b.report("Method 1") { method1 }
  b.report("Method 2") { method2 }
end

However, the compiler was smart enough to find that method1 and method2 had no side effects, and we didn’t use their results, so it optimized the above code like this:

Benchmark.ips do |b|
  b.report("Method 1") { }
  b.report("Method 2") { }
end

This is clearly not the result we expected.

I’m only preventing the compiler from making such optimizations, not preventing the compiler from optimizing method1 and method2.

Topic		Replies	Views
Faster `--release` compile times but slightly worse performance? Crystal Contrib	34	1951	May 12, 2023
Build with --release performance is slow than the 2017 crystal version? Help & Support	16	645	June 13, 2022
Weird compiling behavior using "benchmark" Help & Support	8	268	November 19, 2022
Does omitting types in method parameters affect compiler speed? Help & Support	6	583	October 12, 2019
IO seems to be slower with `--release` flag Help & Support	14	788	June 8, 2021

How to use benchmark correctly?

Related topics