Crystal spec is VERY slow and very high CPU usage when the first time run `crystal spec -O1`

zw963 · January 11, 2026, 9:22am

Please check following screenshot.

Why the first time run spec with -O1 is so slow?(15 min on a very little shard) I almost think about
crystal spec is in the dead loop.

You can reproduce it use GitHub - crystal-china/captcha: Crystal library that generates image CAPTCHAs. and run spec with: crystal spec -O1

jgaskins · January 11, 2026, 8:08pm

I don’t usually run specs with -O1, but that does take an exceptionally long time to compile. I let it run for 27 minutes on my machine before killing it.

➜  captcha git:(master) time csp -O1
^C
crystal spec -O1  24.24s user 0.60s system 1% cpu 27:48.08 total

Without -O1, it takes significantly less time even on a cold compilation:

➜  captcha git:(master) crystal clear_cache
➜  captcha git:(master) time csp
...

Finished in 557.48 milliseconds
3 examples, 0 failures, 0 errors, 0 pending
crystal spec  10.23s user 0.76s system 126% cpu 8.715 total

Even --release wasn’t bad:

➜  captcha git:(master) crystal clear_cache
➜  captcha git:(master) time csp --release
...

Finished in 101.54 milliseconds
3 examples, 0 failures, 0 errors, 0 pending
crystal spec --release  14.58s user 1.38s system 99% cpu 16.127 total

kojix2 · January 12, 2026, 8:57am

Since the second run took 2 seconds, I suspect the cache mechanism is involved. Adding the -O1 option would likely require recompiling many standard library classes into objects again. However, I’m not sure if that alone explains the slow compilation.

zw963 · January 12, 2026, 9:06am

Created issue

kojix2 · January 12, 2026, 10:28am

ysbaddaden has already commented on the GitHub issue, but I was also able to reproduce the problem on my side.

rm -r tmp/*
CRYSTAL_CACHE_DIR=/home/kojix2/Crystal/captcha/tmp crystal spec -O1 --stats

Parse:                             00:00:00.000041077 (   0.98MB)
Semantic (top level):              00:00:00.379616584 ( 151.17MB)
Semantic (new):                    00:00:00.001428824 ( 151.17MB)
Semantic (type declarations):      00:00:00.030816626 ( 159.17MB)
Semantic (abstract def check):     00:00:00.009114052 ( 167.17MB)
Semantic (restrictions augmenter): 00:00:00.010037323 ( 167.17MB)
Semantic (ivars initializers):     00:00:00.014306303 ( 175.17MB)
Semantic (cvars initializers):     00:00:00.188233005 ( 207.23MB)
Semantic (main):                   00:00:00.446909827 ( 287.36MB)
Semantic (cleanup):                00:00:00.000357046 ( 287.36MB)
Semantic (recursive struct check): 00:00:00.000592775 ( 287.36MB)
Codegen (crystal):                 00:00:00.544523850 ( 335.36MB)
Codegen (bc+obj):                  00:48:03.114563685 ( 343.36MB)
Codegen (linking):                 00:00:00.431805899 ( 343.36MB)

Codegen (bc+obj):
 - no previous .o files were reused
...

Finished in 206.41 milliseconds
3 examples, 0 failures, 0 errors, 0 pending

As shown above, Codegen (bc+obj) took 48 minutes and 3 seconds.
Next, I inspected the timestamps in the cache directory:

ls -ltr --time-style="+%H:%M:%S" tmp/home-kojix2-Crystal-captcha-spec/ | tail -n 5

-rw-r--r--. 1 kojix2 kojix2   4316 18:17:53 S-lice40U-I-nt3241.o1.bc
-rw-r--r--. 1 kojix2 kojix2   4248 18:17:53 S-lice40U-I-nt3241.o1.o
-rw-r--r--. 1 kojix2 kojix2 690024 19:05:28 S-taticA-rray40U-I-nt3244321638441.o1.o
-rw-r--r--. 1 kojix2 kojix2   2596 19:05:28 C-rI-mage5858T-I-F-F-.o1.bc
-rw-r--r--. 1 kojix2 kojix2   2912 19:05:28 C-rI-mage5858T-I-F-F-.o1.o

It appears that generating S-taticA-rray40U-I-nt3244321638441.o1.o alone took 47 minutes and 35 seconds.

ysbaddaden · January 12, 2026, 11:13am

Oh, nice idea to use file timestamps!

kojix2 · January 12, 2026, 2:39pm

To be honest, what I still don’t quite understand is why the llc command doesn’t take much time to compile the .bc file.

llc-18 -O1 -time-passes -filetype=obj S-taticA-rray40U-I-nt3244321638441.o1.bc -o out.o

This takes only 29 seconds.
What is the difference between the Crystal compiler and the llc command?

naqvis · January 13, 2026, 3:02am

llc -O1 only runs backend optimizations like instruction selection and register allocation. It does NOT run middle-end optimization passes, which is why it’s fast.

Crystal’s compiler runs the full LLVM optimization pipeline via LLVM.run_passes with “default”. This is equivalent to running opt -O1 first, then doing what llc does.

kojix2 · January 13, 2026, 1:44pm

That’s right.
But, when I run the opt command, it takes less than a second.

opt-18 -O1 -time-passes S-taticA-rray40U-I-nt3244321638441.o1.bc -o /dev/null

So it’s hard to tell which part of the pipeline is actually slow.
In fact, llc seems to be taking longer.
My understanding of compiler internal is fairly limited. And investigation is based entirely on AI support.

Can anyone reproduce the long object-file generation time from .bc files, and explain why?

zw963 · January 13, 2026, 3:04pm

Just for clarify, because crimage use Slice instead of StaticArray in this recent commit, so, if someone want to reproduce, need use code before this commit instead.

Topic		Replies	Views
Specs on Crystal 0.36.0 are taking a LONG time Help & Support	22	1328	February 3, 2021
Very slow build speeds for hello world Help & Support	48	1909	December 25, 2025
Faster `--release` compile times but slightly worse performance? Crystal Contrib	34	2586	May 12, 2023
CPU Compile Speed Benchmarks News	19	6067	March 7, 2026
Build with --release performance is slow than the 2017 crystal version? Help & Support	16	800	June 13, 2022

Crystal spec is VERY slow and very high CPU usage when the first time run `crystal spec -O1`

Related topics