Reuse of Codegen (bc+obj) on Release builds?

Why does the crystal compiler does not generate many object files on release just like when you build with debug?

When I compile a Crystal program with --release and then I rebuild it with a small change on the source code, I get

Codegen (bc+obj):
 - no previous .o files were reused

At compile time (with the -s argument of course), making it really time consuming to do release builds.

But when a debug build is made, the output is like this:

Codegen (bc+obj):
 - 1905/1906 .o files were reused

and of course, the build speed compared to a release build is faster.

Why is that? Why isn’t Crystal able to create multiple objects for --release like a debug build?

--release generates a single LLVM module in addition to aggressive optimizations as to get the best performance possible. I believe its just an alias to -O3 --single-module

If you compile with just -O3 then you’ll see re-use.

--release applies the best code optimization possible.
This optimization doesn’t work across module boundaries, so all code is merged into a single module.

One way around that is link time optimization. Which is something that we used to have some sort of support for, but it rotted due to not being on by default and then code rot happened. So it was removed.

The implementation was also not faster. It would allow some higher level of parallelization in the initial phase where the object files where generated, but then the LTO step would be slow enough to eat any gains.

I wonder a bit if it could be structured differently and be faster that way - my impression is that link_with_LTO(A.o, B.o, C.o, D.o) was essentially LTO(LTO(LTO(A.o, B.o), C.o), D.o). Perhaps it would be possible to structure it like a tree, LTO(LTO(A.o, B.o), LTO(C.o, D.o)). That could potentially have some of the steps be executed in parallel and thus be faster in aggregate assuming multiple CPU. But I don’t know if that is either possible or faster if it is. Perhaps someone more familiar with compiling big C/C++ projects could chime in if something like that would be possible?

Theoretically some of those could also be done in parallel to code generation, but that would require some more extensive changes to the compiler to be less focused on phases than today.

2 Likes