Crystal 0.34.0 slower?

Hi,

Finally upgraded from Crystal 0.30 to Crystal 0.34 after fixing my code. I had to make some minor changes to my code (integer division mainly). I ran all my tests and finally they are all green.
I am using the // operator to do integer divisions.

Next thing I did, was ran a simple test to compare performance. The test computes the fibonnaci number recursively for n = 35. I am a little bit disappointed as it seems my bytecode interpreter runs slower when compiled with Crystal 0.34.

Crystal 0.30 time 1.89 seconds
Crystal 0.34 time 3.29 seconds

The code is compiled with the following command

crystal build main.cr -o ../../crasy --release --no-debug

0.31.0 enabled overflow checking by default. This would most likely be caused by the overhead of that check. You could switch to the wrapping operators (pre 0.31.0 behavior) using like &* or &+, etc.

EDIT: &// isn’t a thing, updated to use a different operator.

1 Like

I changed the relevant code (+ into &+, - into &- etc) but I am not seeing any improvement.
My fibonacci test now takes 4.16 seconds.

I undid the changes (+ into &+ etc) and ran the test again, and the test completes
in 3.33 seconds with overflow checking enabled.

I don’t think the overflow checking is the problem.

Are you doing anything IO related? There was also a change related to that. If possible, maybe share your code and we can see if we can figure it out.

I am reading 3 files, so I am doing some IO, using File.each_line to read these files.
Total lines of code I am reading is less than a 1000 lines.

Writing anything to STDOUT?

Yes, 1 line of text

fibonnaci(035) : 9227465

Hm, yea I’m out of ideas then. IO change was really only related to writing output, and the overflow check is the only other thing I could think of. Hard to say without seeing the code.

My project is an interpreter, consisting of a tokenizer, lexer, parser, symbol table, semantics, type checker, a bytecode compiler and a virtual machine.
Sharing code is a bit of problem, as there is a lot of code I wrote. I know from experience that I should not tinker with the virtual machine part. That is the reason I am using this recursive fibonacci test as I can quickly determine the impact of a change to my virtual machine.

I am also out of ideas, but thanks for your help.

Without seeing the actualy code, everything’s just a wild guess.
It would also help to identify the cause if you could test your code with versions between 0.30.0 and 0.34.0 to see where exactly the performance hit occurs.

1 Like

I understand that not seeing the code is not helpful but not sure that posting 91000+ lines of code (source and spec code) will also be helpful.

I ran all test specs without any changes except the version of the Crystal compiler. I did not see any difference in performance there. The time it takes to complete these test specs are the same for both versions, 0.30 vs 0.34, around the 2 minute mark.

So I than I decided to run my fibonacci test and there is where I saw the performance difference. Yesterday I used valgrind and it seems more time is spent /usr/share/crystal/src/int.cr (0.34) as compared to 0.30 but I am not certain yet as it is hard to compare the different valgrind runs.

Hopefully I will find the problem myself, but this will take some time.

when I find myself in times of trouble
father turing comes to me
speaking words of wisdom
profile it

profile it, profile it
profile it, profile it
yeah there will be an answer
profile it

2 Likes

Not sure if it was clear: I meant testing performance with Crystal 0.31.0, 0.32.0, 0.33.0 in order to narrow it down where it changes.

I understood you the first time, it is a bit of work.

This is the plan, I will start downloading all the binaries from 0.30 up to 0.34 and per version

  1. compile my code and report the compile time and size of executable
  2. run and time all my test specs
  3. run and time the fibonacci test

Is this a good enough plan ?

Not sure if it was clear: if you profile it (on mac you can use Instruments, on other systems maybe something else), you will immediately know the cause of the slowness. No need to test against several versions, you will do it for 0.34.0 and then you will know what’s slow.

I would love to be able to do this for some of my code, but I’ve never profiled any code ever. Do you have any tutorials on how to do that with crystal code? What apps do you need to have, and commands you need to run?

A PR to add a Profiling section to Guides - Crystal would be :100: :wink:.

3 Likes

I understood what you were saying and that is what I was doing yesterday with valgrind. I also reported my initial findings but understanding valgrind’s output is not that easy, at least for me.

Not sure if your question is directed at me, but on Linux I am using valgrind.
There is not a lot I found on the internet on how to best interpret
the output of valgrind but here are the commands I use

$ valgrind --tool=callgrind ./name_of_exe

Note, valgrind is quit slow, but it produces a file
callgrind.out.PID which you can view and inspect with

$ kcachegrind callgrind.out.PID
2 Likes

I was more asking @HankAnderson since he mentioned to profile your code using instruments. I’m on mac, but I don’t know where to begin with any of that.

I’ve never used valgrind either. I know I was pointed to it before for some issues I was running in to, but to me it was like “oh, you want to fix a light switch in your house? here’s a book on electrical engineering and all of the city codes”. :laughing: Yeah, I got a bit overwhelmed and noped right out lol.

Thanks for that little snippet, it’s already a little easier to ingest than what I’ve found looking around online. As @Blacksmoke16 mentioned, if anyone reading this has knowledge on this stuff, a PR to the reference guides would be :fire: (and preferably something written from the viewpoint of a person that’s never done this stuff :stuck_out_tongue:)