Crystal 0.34.0 slower?

Thanks for the details @gummybears, that looks like a very cool project.

There’s not much to go on from your methods like you had mentioned, but I am not convinced this is just a problem with your program. I modified Kostya’s crystal-benchmarks-game repo to instead compare Crystal versions (https://github.com/mgomes/crystal-bench).

Here are my results:

### Binarytrees
Version Time,sec Memory, MB
0.30.0 1.77 49.0
0.34.0 1.83 48.9

Fannkuchredux

Version Time,sec Memory, MB
0.30.0 2.92 2.9
0.34.0 3.63 2.8

Fasta

Version Time,sec Memory, MB
0.30.0 2.22 3.8
0.34.0 4.51 4.0

Mandelbrot

Version Time,sec Memory, MB
0.30.0 8.40 2.9
0.34.0 8.27 3.0

Meteor

Version Time,sec Memory, MB
0.30.0 0.14 3.4
0.34.0 0.13 3.3

Nbody

Version Time,sec Memory, MB
0.30.0 1.13 3.2
0.34.0 1.44 3.3

Pidigits

Version Time,sec Memory, MB
0.30.0 9.52 8.2
0.34.0 8.96 7.4

Regexdna

Version Time,sec Memory, MB
0.30.0 1.14 20.0
0.34.0 1.22 19.3

Revcomp

Version Time,sec Memory, MB
0.30.0 1.89 365.6
0.34.0 2.25 339.0

Spectralnorm

Version Time,sec Memory, MB
0.30.0 1.75 4.3
0.34.0 1.82 4.3

Threadring

Version Time,sec Memory, MB
0.30.0 0.37 5.2
0.34.0 0.51 5.0

Not all the tests are slower but for some, like Fasta, it’s quite a bit slower. I suspect the answer is in there somewhere. I still have to look more closely at the slower programs.

If you compile fasta with Crystal 0.33.0 and run it with n=15000000, you get:

3.04 real         2.24 user         0.12 sys

If you compile it with -D disable_overflow and run it, you get:

2.19 real         1.97 user         0.13 sys

From this, it’s pretty obvious that the slowness comes from overflow checking.

@gummybears Even if you changed your code to use &+ instead of +, I’m sure you didn’t change every occurrence of + and - in the standard library. You have to change it everywhere. But that’s impossible of course.

My conclusion is one of these:

  1. Get used to the new performance. The old performance was fake because programs might have been wrong because of silent overflows.
  2. It might still be good for Crystal to provide a flag to disable overflow checking. Rust removes overflow checking when compiled in release mode. The idea is that once you make sure in development that nothing overflow, you disable the check at runtime for greater performance.

I believe doing 2 is very, very important if Crystal wants to be successful in more math-y territories, where performance is critical. Overflow? Not that critical because you usually use big numbers like Int64 or UInt64 and the numbers are bounded.

I wouldn’t do it like Rust. I would have overflow checking by default, but still provide a flag to disable it. The core team should basically revert the PR that removed -D disable_overflow. Or better put it with another name.

4 Likes

Actually, re-running fast with 0.33.0 with and without overflow gives me a difference of 0.3 seconds. I don’t know why I got 3.04 the first time.

There’s something else going on. 0.34.0 is significantly slower than 0.33.0. In my tests I see 2.41 vs 3.56, similar to what @mgomes said.

Oh, well, nevermind… I guess for fasta the thing is that it’s using puts and print a lot. Changing it to STDOUT.flush_on_newline = false makes it faster again.

There is a difference of 0.3s between oveflow and no-overflow, so that might be it.

2 Likes

@gummybears Can you run your benchmark with 0.33.0 with and without -D disable_overflow?

@HankAnderson I will post the results asap

@HankAnderson

I build the executable with

$ crystal build main.cr -o crasyvm_nooverflow.out --release --no-debug -D disable_overflow

and ran the tests 3 times each, minimum execution time showed, no averages.

Observations

Using Crystal 0.31.1 with overflow disabled has a significant
impact, and is only 0.62 seconds slower as compared to Crystal 0.30.

From Crystal 0.32.1 up to 0.34.0, disabling overflow has some
impact (0.25 seconds averaged) but compared to Crystal 0.30
the test is on average 1.42 seconds slower.

On average 1.17 seconds, the test program is doing extra work
and this is not to check integer overflows but something else.

Summary (23-05-2020)

Crystal      0.30     0.31.1     0.32.1   0.33.0    0.34.0

Fibonacci     N/A       2.55       3.34     3.43      3.28
n = 35

Delta t                -0.88      -0.28    -0.28     -0.20
no overflow
checks
compared with
18-05-2020

Delta t                 0.62       1.41     1.50      1.35
compared with
Crystal 0.30

Other metrics
Compile       N/A     180.01     178.23   175.83    147.45
Size exe      N/A    4270784    4097760  4106032   3929864


Summary (18-05-2020)
No changes made in my codebase

Crystal      0.30     0.31.1     0.32.1   0.33.0    0.34.0

Fibonacci    1.93       3.43       3.62     3.71      3.48
n = 35

Other metrics
Compile    105.09     144.28     147.32   145.88    154.12
Size exe  4135464    3960576    3897608  3913952   3914176
1 Like

Any more progress here? I’m curious as to what could have caused the performance change

I think I did all I can do. I thought of bisecting the github repo starting at 0.30, but decided to continue work on my interpreter and stay with the Crystal 0.34 version.

Well if you don’t want to bisect it and also won’t let anyone else bisect it, this is a pretty big waste of time.
I’m honestly so annoyed by this thread, why did all these people even continue the guessing game for days when there is a direct and reliable solution that’s just being dismissed.

It was a productive thread, we got a lot of different benchmarks testing the overflow feature. With respect to the bisecting, @gummybears isn’t referring to bisecting his app, he has said that his code has not changed. What needs to be bisected is Crystal between these versions. So anyone can do that work.

Based on the benchmark results, it does still seem to be something else that caused a slowdown between versions.

I am still wondering if anyone from the core team has any info around what performance benchmark tests are performed between Crystal releases. If none, maybe that is something that we could develop? Performance is a feature of the language that is touted and treasured, and so it would be great if it is measured between releases.

1 Like

I don’t think you understand, I am talking about perhaps bisecting the Crystal compiler repo but decided against it, as I am not familiar with the code of the Crystal compiler.

Perhaps you should read things better before commenting in such a way. I am working on an interpreter. The performance problem is not with my code as far as I can tell but with the Crystal compiler. There is no guessing game here, I did what people asked me to do and provided the information.

If we’re putting things that way… No I don’t think you understand.

You don’t need to be familiar with any code at all, that’s the great part about bisecting.

But since you’re claiming that the slowdown affects your code particularly, that means any measurement needs both your code where such a slowdown is reproducible and Crystal compiler code. So, what would be measured is your code but what would be bisected based on that is Crystal compiler repo.

Yeah, “anyone” who has the Crystal compiler code and the code affected by slowdown.
Which currently is just @gummybears

Yes reproducible sample would be nice…