Thanks for the details @gummybears, that looks like a very cool project.
There’s not much to go on from your methods like you had mentioned, but I am not convinced this is just a problem with your program. I modified Kostya’s crystal-benchmarks-game repo to instead compare Crystal versions (https://github.com/mgomes/crystal-bench).
Here are my results:
### Binarytrees
Version
Time,sec
Memory, MB
0.30.0
1.77
49.0
0.34.0
1.83
48.9
Fannkuchredux
Version
Time,sec
Memory, MB
0.30.0
2.92
2.9
0.34.0
3.63
2.8
Fasta
Version
Time,sec
Memory, MB
0.30.0
2.22
3.8
0.34.0
4.51
4.0
Mandelbrot
Version
Time,sec
Memory, MB
0.30.0
8.40
2.9
0.34.0
8.27
3.0
Meteor
Version
Time,sec
Memory, MB
0.30.0
0.14
3.4
0.34.0
0.13
3.3
Nbody
Version
Time,sec
Memory, MB
0.30.0
1.13
3.2
0.34.0
1.44
3.3
Pidigits
Version
Time,sec
Memory, MB
0.30.0
9.52
8.2
0.34.0
8.96
7.4
Regexdna
Version
Time,sec
Memory, MB
0.30.0
1.14
20.0
0.34.0
1.22
19.3
Revcomp
Version
Time,sec
Memory, MB
0.30.0
1.89
365.6
0.34.0
2.25
339.0
Spectralnorm
Version
Time,sec
Memory, MB
0.30.0
1.75
4.3
0.34.0
1.82
4.3
Threadring
Version
Time,sec
Memory, MB
0.30.0
0.37
5.2
0.34.0
0.51
5.0
Not all the tests are slower but for some, like Fasta, it’s quite a bit slower. I suspect the answer is in there somewhere. I still have to look more closely at the slower programs.
If you compile fasta with Crystal 0.33.0 and run it with n=15000000, you get:
3.04 real 2.24 user 0.12 sys
If you compile it with -D disable_overflow and run it, you get:
2.19 real 1.97 user 0.13 sys
From this, it’s pretty obvious that the slowness comes from overflow checking.
@gummybears Even if you changed your code to use &+ instead of +, I’m sure you didn’t change every occurrence of + and -in the standard library. You have to change it everywhere. But that’s impossible of course.
My conclusion is one of these:
Get used to the new performance. The old performance was fake because programs might have been wrong because of silent overflows.
It might still be good for Crystal to provide a flag to disable overflow checking. Rust removes overflow checking when compiled in release mode. The idea is that once you make sure in development that nothing overflow, you disable the check at runtime for greater performance.
I believe doing 2 is very, very important if Crystal wants to be successful in more math-y territories, where performance is critical. Overflow? Not that critical because you usually use big numbers like Int64 or UInt64 and the numbers are bounded.
I wouldn’t do it like Rust. I would have overflow checking by default, but still provide a flag to disable it. The core team should basically revert the PR that removed -D disable_overflow. Or better put it with another name.
Actually, re-running fast with 0.33.0 with and without overflow gives me a difference of 0.3 seconds. I don’t know why I got 3.04 the first time.
There’s something else going on. 0.34.0 is significantly slower than 0.33.0. In my tests I see 2.41 vs 3.56, similar to what @mgomes said.
Oh, well, nevermind… I guess for fasta the thing is that it’s using puts and print a lot. Changing it to STDOUT.flush_on_newline = false makes it faster again.
There is a difference of 0.3s between oveflow and no-overflow, so that might be it.
and ran the tests 3 times each, minimum execution time showed, no averages.
Observations
Using Crystal 0.31.1 with overflow disabled has a significant
impact, and is only 0.62 seconds slower as compared to Crystal 0.30.
From Crystal 0.32.1 up to 0.34.0, disabling overflow has some
impact (0.25 seconds averaged) but compared to Crystal 0.30
the test is on average 1.42 seconds slower.
On average 1.17 seconds, the test program is doing extra work
and this is not to check integer overflows but something else.
Summary (23-05-2020)
Crystal 0.30 0.31.1 0.32.1 0.33.0 0.34.0
Fibonacci N/A 2.55 3.34 3.43 3.28
n = 35
Delta t -0.88 -0.28 -0.28 -0.20
no overflow
checks
compared with
18-05-2020
Delta t 0.62 1.41 1.50 1.35
compared with
Crystal 0.30
Other metrics
Compile N/A 180.01 178.23 175.83 147.45
Size exe N/A 4270784 4097760 4106032 3929864
Summary (18-05-2020)
No changes made in my codebase
Crystal 0.30 0.31.1 0.32.1 0.33.0 0.34.0
Fibonacci 1.93 3.43 3.62 3.71 3.48
n = 35
Other metrics
Compile 105.09 144.28 147.32 145.88 154.12
Size exe 4135464 3960576 3897608 3913952 3914176
I think I did all I can do. I thought of bisecting the github repo starting at 0.30, but decided to continue work on my interpreter and stay with the Crystal 0.34 version.
Well if you don’t want to bisect it and also won’t let anyone else bisect it, this is a pretty big waste of time.
I’m honestly so annoyed by this thread, why did all these people even continue the guessing game for days when there is a direct and reliable solution that’s just being dismissed.
It was a productive thread, we got a lot of different benchmarks testing the overflow feature. With respect to the bisecting, @gummybears isn’t referring to bisecting his app, he has said that his code has not changed. What needs to be bisected is Crystal between these versions. So anyone can do that work.
Based on the benchmark results, it does still seem to be something else that caused a slowdown between versions.
I am still wondering if anyone from the core team has any info around what performance benchmark tests are performed between Crystal releases. If none, maybe that is something that we could develop? Performance is a feature of the language that is touted and treasured, and so it would be great if it is measured between releases.
I don’t think you understand, I am talking about perhaps bisecting the Crystal compiler repo but decided against it, as I am not familiar with the code of the Crystal compiler.
Perhaps you should read things better before commenting in such a way. I am working on an interpreter. The performance problem is not with my code as far as I can tell but with the Crystal compiler. There is no guessing game here, I did what people asked me to do and provided the information.
If we’re putting things that way… No I don’t think you understand.
You don’t need to be familiar with any code at all, that’s the great part about bisecting.
But since you’re claiming that the slowdown affects your code particularly, that means any measurement needs both your code where such a slowdown is reproducible and Crystal compiler code. So, what would be measured is your code but what would be bisected based on that is Crystal compiler repo.