Faster `--release` compile times but slightly worse performance?

Right now when you pass --release to the compiler, it will produce a single LLVM file that is then optimized and compiled. The reason we do this is that, if LLVM has all the code right there, it will be able to inline and really optimize the final executable. And that’s indeed what happens! However, it takes a lot of time to do this.

Instead, we could still produce a lot of LLVM files (more or less one per each class/struct/file), tell LLVM to optimize those independently, and then produce an executable.

In my experiments, this leads to much faster compile times. For example compiling the compiler in release mode usually takes a bit more than 5 minutes, but with this new approach it takes a bit more than 1 minute. Compiling a “Hello world” program takes around 8 seconds (in release mode), and with the new approach it takes 3 seconds.

Do you think it would be worth to either:

  • change the current --release mode so that it compiles faster, despite producing slower executables
  • add a way to use this new --release mode that’s faster to compile but slower to run?

My thinking is that if you are going to distribute or deploy a program you want it to be as fast as possible, even if you have to wait a few minutes to get there. What do you think?

6 Likes

Same for me.

Please don’t (unless the difference in execution speed is negligible).

I can probably see the use case for some very big project (maybe, sometimes).

We can call it --ludicrous-speed :wink:

2 Likes

I would definitely trade slower compile times for maximum performance in release mode as well. We already have non-release mode for local development. Also, for better or worse, synthetic benchmarks do affect the perception of the language.

Just curious, how far is your proposal from compiling today without --release?

4 Likes

What do you mean?

:+1:

2 Likes

That definitely sounds interesting for use cases where you want performant code, but don’t need ultimate optimization.
Maybe it could be a replacement for the current --release semantics, but we’d have to look into that.
Before deciding anything, we need some insights into the effect of the optimizations.
Specifically, how much runtime performance differs between both optimization modes. I could see that it might not be that much, because many hot path can probably be optimized inside a single module. But I don’t know. It depends how much cross-module optimizations we’re losing.

It’s not clear how this new proposed way differs from the compilation today without --release.

If the new way is similarly as fast as compiling today without --release, can it just replace that mode and --release is kept the same? That was my thinking.

1 Like

Okay, here’s an example.

We’ll use this program, which is a mandelbrot benchmark: crystal-benchmarks-game/mandelbrot.cr at master · kostya/crystal-benchmarks-game · GitHub (run it with 10000)

  • Without --release
    • Time to compile: 1.35s
    • Time to execute: 15s
  • With the old --release mode
    • Time to compile: 7.75s
    • Time to execute: 7.32s
  • With the new --release mode
    • Time to compile: 3.14s
    • Time to execute: 8.53s

You can already see that the new --release mode is closer to “without --release” when it comes to how long it takes to compile the program, and closer to “with the old --release” when it comes to executing the program

Next, havlak: crystal/havlak.cr at master · crystal-lang/crystal · GitHub

  • Without --release
    • Time to compile: 1.5s
    • Time to execute: 35.56s
  • With the old --release mode
    • Time to compile: 8.71s
    • Time to execute: 9.42s
  • With the new --release mode
    • Time to compile: 3.27s
    • Time to execute: 15.29s

Actually, it seems that this new --release mode is right in between the two modes. So these could be similar to -O1, -O2 and -O3 from other build systems… maybe?

Finally, let’s take a big program: the compiler.

Compiling the compiler in release mode goes from about 5 minutes to 1.5 minutes. Then, compiling that havlak program above in non-release mode goes from 1.72s to 2.24s. To compile the compiler in non-release mode again, it goes from 33.35s to 38.98s.

(this last part was a bit confusing, but it shows that with this new approach, we’ll have to wait 1.5 minutes to compile the compiler in release mode instead of 5 minutes, but then compiling the compiler with this new compiler takes 38.98s instead of 33.35s, because it’s slightly slower)

6 Likes

How about adding one more flag for –release and name it –optimize or to match gcc/clang name it as –O3 so if it is present in a command line, it will do the current way of doing it (read one big IR file). If it is not present, we create a bunch of smaller files that will be compiled much faster, at the expense of the performance.

9 Likes

I feel like --optimize is a nice short name except that --release also implies doing optimizations.

Maybe someone can come up with a more specific, but still short name (but not as short as cryptic -O3). Something that implies extra optimizations applied globally.

1 Like

–release --speed maybe?

or both --O3 or --speed, so whoever likes it, will use one of the variants.

Please look at this from the point of view of users.

The meaning of --release should not change. It should continue to mean compile to create fastest code execution possible.

If you are going to introduce new compile-time behavior then give them new semantics.

Something like: --compile-opt-time

Please don’t make old names mean new things, create new names for new behavior.

13 Likes

Could be speed, yes!

No reason to not have both short and long versions of an option. So --optimize when you want to be verbose, and -O when you want to do less typing. Then --release becomes an alias for whatever the maximum is (and maybe other things you’d want in a release build).

3 Likes

Thanks for the feedback. There doesn’t seem to be a use case for this so I’m closing this discussion.

One use case: running specs for the compiler is way too slow on an unoptimized compiler, to the point that at times is better to first spend the time of --release prior to executing the spec. In such situations, trading compilation time for a middle-ground execution (compilation) of specs is something worse trying.

4 Likes

Actually, I can see making this the default case for not compiling with --release.

Currently, if a user doesn’t compile with --release s/he knows the code won’t be optimized for speed, and is compiling to establish functionality.

But if using this method speeds up those cases, I don’t see users being upset about that, if they even cared, and it would actually give Crystal a better image with newbies, et al, who would just think Crystal compiles fast for non --release mode.

Then you wouldn’t have to create new semantics for the different modes.
It would like speeding up any other part of the code base.

And if someone really wanted/needed the old slow compile mode, create a compiler flag for it, and state it would be deprecated in 2.0. :blush:

2 Likes

I do think there’s still a use case for the current non-release compilation. If the compilation times for this new non-monolothicly-optimized method were, say, 1.2x as long as the current non-release compilation, I would agree with you. However, all the compilation times above are at least twice as long as the current non-release compilation.

That said, I think having a partway compilation solution could be nice. I just don’t think it properly replaces the existing compilation options, and I think we’d want to have a nice, meaningful flag for it.

3 Likes