Baseline content of a compiled executable?

A curiosity.

Compiling an empty file yields a binary that has 1.4M, what’s in there?

I have no idea about LLVM, but observe that --emit llvm-ir generates a file that has approximately 100K lines! I can recognize there at least some stuff related to the core of Crystal itself. Also, see names of things that seem to be related to garbage collection, right? What else?

On the other hand, the binary obtained compiling an empty file with --release is about 500 KB. Almost 1M less. What’s left out?

https://github.com/crystal-lang/crystal/blob/master/src/prelude.cr contains all the code that is required by default.

You also could build with --no-debug to remove debug symbols for production binaries, which would reduce the size a bit more.

It’s also possible to give your own prelude file, or build with without garbage collection. However unless you know what you’re doing it’s probably not worth it.

I’m sure, a similar question has been asked before, but I’m unable to dig it up. This SO question is somewhat related, though: Crystal-lang: why is the LLVM “hello.bc” not the same if generated by Crystal or by clang?

Looking at the prelude points in the right direction, but doesn’t answer this entirely. Requiring some source code doesn’t automatically end up in a larger executable size. Only code that is actually reachable from the program will be included in the codegen. So if you do require "some_large_dependency", it doesn’t blow up your binary size at once. It entirely depends which code is actually used.

The prelude essentially includes the core components of Crystal’s standard library. Not all components are used by each program, though. But quite a few components are required for every program in order to provide a functioning Crystal runtime.

The runtime includes essential features such as:

  • process environment (standard file descriptors, argv)
  • signal handling
  • exception handling
  • garbage collection
  • concurrency runtime (main fiber, scheduler, threads)

(For further details, many of the entry points for these components can be found in src/kernel.cr and src/crystal/main.cr.)

The implementation of these runtime features already depends on many other core libraries from the prelude, such as the basic data types, collections etc.

In the end, most of the prelude is actually used by the runtime and thus in every Crystal program. So this code will always by included in the executable.

For an empty program, the Crystal runtime accounts for about 1.1MB (0,8MB without debug symbols). But that only covers the code generated from Crystal source.

The Crystal runtime features also depend on a few external libraries such as libgc, libpthread, libevent etc. Statically linked libraries also add to the binary size. These are currently libgc and libcrystal. The latter is only a tiny library for sigfault handling, but libgc comes in at about 0,4MB.

Combined, that results in a binary size of about 1.4MB for an empty Crystal program (1.1MB without debug symbols). That’s how much space a program needs to do nothing. Seems bloated, but in practice, this is not really relevant. Because a program that does something always needs most of these dependencies anyway, so it doesn’t add much overhead having them for the basic runtime. For example: When your program uses String#to_i, that doesn’t increase the executable size because String#to_i is already used by the runtime and included in the codegen anyway.

2 Likes

That is an extraordinary response, thanks very much!

What about the second question? The one about the size with —release option?

(To be clear, I am using an empty program as a mean/exercise to discover what is the minimum included. Such a program is of little practical interest and does not deserve to be optimized for.)

Is this what you are looking for?