Reuse of Codegen (bc+obj) on Release builds?

Fijxu · May 15, 2025, 6:53am

Why does the crystal compiler does not generate many object files on release just like when you build with debug?

When I compile a Crystal program with --release and then I rebuild it with a small change on the source code, I get

Codegen (bc+obj):
 - no previous .o files were reused

At compile time (with the -s argument of course), making it really time consuming to do release builds.

But when a debug build is made, the output is like this:

Codegen (bc+obj):
 - 1905/1906 .o files were reused

and of course, the build speed compared to a release build is faster.

Why is that? Why isn’t Crystal able to create multiple objects for --release like a debug build?

syeopite · May 15, 2025, 7:19am

--release generates a single LLVM module in addition to aggressive optimizations as to get the best performance possible. I believe its just an alias to -O3 --single-module

If you compile with just -O3 then you’ll see re-use.

straight-shoota · May 15, 2025, 9:32am

--release applies the best code optimization possible.
This optimization doesn’t work across module boundaries, so all code is merged into a single module.

yxhuvud · May 16, 2025, 4:02pm

One way around that is link time optimization. Which is something that we used to have some sort of support for, but it rotted due to not being on by default and then code rot happened. So it was removed.

The implementation was also not faster. It would allow some higher level of parallelization in the initial phase where the object files where generated, but then the LTO step would be slow enough to eat any gains.

I wonder a bit if it could be structured differently and be faster that way - my impression is that link_with_LTO(A.o, B.o, C.o, D.o) was essentially LTO(LTO(LTO(A.o, B.o), C.o), D.o). Perhaps it would be possible to structure it like a tree, LTO(LTO(A.o, B.o), LTO(C.o, D.o)). That could potentially have some of the steps be executed in parallel and thus be faster in aggregate assuming multiple CPU. But I don’t know if that is either possible or faster if it is. Perhaps someone more familiar with compiling big C/C++ projects could chime in if something like that would be possible?

Theoretically some of those could also be done in parallel to code generation, but that would require some more extensive changes to the compiler to be less focused on phases than today.

Topic		Replies	Views
Impact of `--release` vs `-O3` Crystal Contrib	5	414	June 2, 2025
Faster `--release` compile times but slightly worse performance? Crystal Contrib	34	2576	May 12, 2023
Crystal spec is VERY slow and very high CPU usage when the first time run `crystal spec -O1` Help & Support	9	290	January 13, 2026
Does --debug disable/affect --release mode build? Help & Support	0	386	August 26, 2021
Add an option to disable specific modules to the crystal build command? Community	12	619	September 30, 2019

Reuse of Codegen (bc+obj) on Release builds?

Related topics