Just upgraded from 0.35.1 → 0.36.0, and when trying to run the spec suite on Lucky, it takes 10+ minutes to run. Is anyone else seeing this?
It seems that when I run crystal spec the first time, it’s like 10 minutes until I see the first green dot output. Then if I run crystal spec right after, the whole thing takes about 3 seconds to complete. After that, if I delete my ~/.cache/crystal/ and run crystal spec again, it’s back to the 10+ minutes to complete. With 0.35.1, it was always ~5sec to run the suite.
Here’s a few things I noticed.
I upgraded Crystal through homebrew on Mac (not the M1), and it took me from LLVM 10 to LLVM 9.
With no cache, it takes a really long time, but with the cache, it’s quick
While the specs are running, I checked my process and saw these:
// When running super slow
❯ ps aux | ack -i "crystal"
jeremywoertink 2635 100.0 1.3 4609252 223000 s000 R+ 9:24AM 0:13.64 /usr/local/Cellar/crystal/0.36.0/libexec/crystal spec
Ok, just tried the app I’m working on. There’s 3 of us working on this Lucky app. The first dev (using Apple M1 on crystal 0.35.1) took 12s to boot the app. The second dev (using a Arch Linux on crystal 0.35.1) took 10s to boot the app. Me using the Apple Intel on crystal 0.36 took 6 minutes and 53 seconds to boot the app.
Booting or compiling the app. Running crystal build src/start_server.cr takes just as long as crystal src/start_server.cr. Also the same for crystal spec and crystal build spec_runner.cr
This is before cache with Crystal 0.36.0
❯ time crystal build src/start_server.cr
crystal build src/start_server.cr 422.70s user 5.64s system 106% cpu 6:42.32 total
This is after cache with Crystal 0.36.0
❯ time crystal build src/start_server.cr
crystal build src/start_server.cr 20.00s user 2.29s system 144% cpu 15.403 total
This is before cache with Crystal 0.35.1
❯ time crystal build src/start_server.cr
crystal build src/start_server.cr 41.33s user 2.83s system 142% cpu 30.887 total
This is after cache with Crystal 0.35.1
❯ time crystal build src/start_server.cr
crystal build src/start_server.cr 9.05s user 1.62s system 133% cpu 8.006 total
These are using those tar builds. Surprising that it was a lot quicker The “slow” 0.36 only took about 1 minute to run. So I wonder what that one is doing that the brew version isn’t.
Do you have big specs files with lots of variables shared from the top level and inside specs? I think it might be related to one PR I sent, but I’m not sure.
I guess checking out dexter and running the specs is enough to reproduce this?
I’m seeing the same slowness with Homebrew-installed Crystal. Most of my apps are probably smaller than Jeremy’s, but the one I’m working on took over a minute to compile with no cache and 1.5 seconds with a populated cache — a factor of 40x on a first compile.
I wonder if the problem is the LLVM change down to 9.0.1. Compiling the Crystal compiler looks like it uses LLVM 11 and only takes 90 seconds on the first, 16 seconds on subsequent compiles — a factor of 5.5x.
➜ crystal git:(master) rm -rf ~/.cache/crystal
➜ crystal git:(master) time (make clean crystal)
Using /usr/local/opt/llvm/bin/llvm-config [version=11.0.0]
rm -rf .build
rm -rf ./docs
rm -rf src/llvm/ext/llvm_ext.o
rm -rf src/ext/sigfault.o src/ext/libcrystal.a
c++ -c -o src/llvm/ext/llvm_ext.o src/llvm/ext/llvm_ext.cc -I/usr/local/Cellar/llvm/11.0.0_1/include -std=c++14 -stdlib=libc++ -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
cc -fPIC -I/usr/local/opt/llvm/include -c -o src/ext/sigfault.o src/ext/sigfault.c
ar -rcs src/ext/libcrystal.a src/ext/sigfault.o
CRYSTAL_CONFIG_LIBRARY_PATH="/usr/local/Cellar/crystal/0.35.1_1/embedded/lib:/usr/local/lib" CRYSTAL_CONFIG_BUILD_COMMIT="be5de4ac1" SOURCE_DATE_EPOCH="1611696425" ./bin/crystal build --link-flags="-L/usr/local/opt/llvm/lib" -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
( make clean crystal; ) 111.58s user 4.02s system 129% cpu 1:29.36 total
➜ crystal git:(master) time (make clean crystal)
Using /usr/local/opt/llvm/bin/llvm-config [version=11.0.0]
rm -rf .build
rm -rf ./docs
rm -rf src/llvm/ext/llvm_ext.o
rm -rf src/ext/sigfault.o src/ext/libcrystal.a
c++ -c -o src/llvm/ext/llvm_ext.o src/llvm/ext/llvm_ext.cc -I/usr/local/Cellar/llvm/11.0.0_1/include -std=c++14 -stdlib=libc++ -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS
cc -fPIC -I/usr/local/opt/llvm/include -c -o src/ext/sigfault.o src/ext/sigfault.c
ar -rcs src/ext/libcrystal.a src/ext/sigfault.o
CRYSTAL_CONFIG_LIBRARY_PATH="/usr/local/Cellar/crystal/0.35.1_1/embedded/lib:/usr/local/lib" CRYSTAL_CONFIG_BUILD_COMMIT="be5de4ac1" SOURCE_DATE_EPOCH="1611696425" ./bin/crystal build --link-flags="-L/usr/local/opt/llvm/lib" -o .build/crystal src/compiler/crystal.cr -D without_openssl -D without_zlib
( make clean crystal; ) 23.52s user 2.04s system 155% cpu 16.402 total
The compiler compiles “macro run” programs in release. For example it compiles “src/ecr/process.cr” from the standard library once, then it will invoke that executable once for each “macro run”.
The reason that program is compiled in release mode is because supposedly we are going to compile it once and run it multiple times. Then the program is also cached and reused in next compilations.
That bc+obj phase is just asking LLVM to go from a .bc file to an .o file, essentially compiling LLVM code.
The old compiler I’m using is linked against LLVM 10. The new one is against LLVM 11. My guess is that LLVM 11 is much, much slower than LLVM 10, but I don’t know.
I’ll try installing LLVM 9, compiling a compiler against it and seeing if that lowers the time (LLVM 10 is not available in Mac for some reason.)
As a workaround we can change the compiler so that it compiles macro run programs in non-release. Maybe it will still be fast enough.
It seems now all that initialization happens in the “main” function and that function ends up being huge and it makes LLVM have a really hard time optimizing that. Or that’s my guess.
I think we should revert those two PRs. After all they are just optimizations. And they might optimize the runtime performance of programs a bit, but if it makes compilation that much slower it’s not worth it, or we can rethink it later.
I was curious if there was a difference on Linux, too, so I created a branch upgrading one of my apps to Crystal 0.36. This is the CI build result (both were run just a few minutes apart):
The Docker builds compile 3 different app entrypoints with shards build --release --production. The spec run is just crystal spec. The master branch is using Crystal 0.35.1.