Consider the following snippet:
```crystal
N = 3000
{% if flag?(:bar) %}
…
class Bar
end
Bar.new
{% end %}
{% for i in 0...N %}
class Foo{{ i }}
end
{% end %}
{% for i in 0...N %}
Foo{{ i }}.new
{% end %}
```
We are going to compile this twice, first with a cold cache, then with `-Dbar`:
```
$ crystal clear_cache
$ time crystal run --prelude=empty test.cr
real 0m2.023s
user 0m5.610s
sys 0m2.894s
$ time crystal run --prelude=empty -Dbar test.cr
real 0m1.962s
user 0m5.414s
sys 0m3.072s
```
These times suggest that the addition of `Bar` completely invalidates the object cache, and indeed, if you pass also `--stats` to the second compilation, it would say `no previous .o files were reused`. How could this be the case when none of the `Foo`s depend on `Bar`?
This counterintuitive behavior arises from the way Crystal separates LLVM IR into LLVM modules. Each non-generic type or generic instance has its own LLVM module containing all of that type or instance's class and instance methods, and then the rest goes to a special LLVM module called `_main`. The bytecode for `Foo0` can be disassembled back to LLVM IR using `llvm-dis`:
```llvm
%Foo0 = type { i32 }
@":symbol_table" = external global [0 x ptr]
; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
alloca:
%x = alloca ptr, align 8
br label %entry
entry: ; preds = %alloca
%0 = call ptr @malloc(i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64))
call void @llvm.memset.p0.i64(ptr align 4 %0, i8 0, i64 ptrtoint (ptr getelementptr (%Foo0, ptr null, i32 1) to i64), i1 false)
%1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
store i32 7, ptr %1, align 4
store ptr %0, ptr %x, align 8
%2 = load ptr, ptr %x, align 8
ret ptr %2
}
```
The line `store i32 7, ptr %1, align 4` is where `Foo0.allocate` stores `Foo0.crystal_type_id` to the newly allocated memory area; this `7` corresponds to the compile-time value of `Foo0.crystal_type_id`. If we drop `-Dbar` again, the same value now becomes `6`.
The compiler component responsible for generating these type IDs is the `Crystal::LLVMId` class; it assigns numerical IDs in sequential order, with types defined later in the source code or the compiler receiving larger IDs than their sibling types. In particular, all structs have larger type IDs than every class. (You can see this information by setting the environment variable `CRYSTAL_DUMP_TYPE_ID` to 1 during compilation.) Hence, by defining `Bar` at the beginning of the file, we have incremented the type ID of every single `Foo` by 1, and the inlining breaks the cache.
If we move `Bar` to the bottom of the file, then recompilations will be able to reuse the `Foo` object files, because their type IDs remain untouched. In practice, however, the splitting of source code into separate files renders this specific workaround nearly impossible to pull off, not to mention that other constructs like `typeof` and `is_a?` also inline type IDs, apart from `Reference.new`. In short, if your code tries to remove the `Nil` from an `Int32?`, its cache will get invalidated _any_ time you add or remove a class.
***
This PR does not fight against the type ID assignment. It merely stops the inlining:
```llvm
@"Foo0:type_id" = external constant i32
; Function Attrs: uwtable
define ptr @"*Foo0@Reference::new:Foo0"() #0 {
; ...
%1 = getelementptr inbounds %Foo0, ptr %0, i32 0, i32 0
%2 = load i32, ptr @"Foo0:type_id", align 4
store i32 %2, ptr %1, align 4
; ...
}
```
The actual compile-time value is now defined in `_main`:
```llvm
@"Foo0:type_id" = constant i32 6
```
With this simple trick, the object file cache is now working as intended:
```
$ bin/crystal clear_cache
$ time bin/crystal run --prelude=empty test.cr
real 0m2.102s
user 0m5.386s
sys 0m2.442s
$ time bin/crystal run --prelude=empty -Dbar test.cr
real 0m1.482s
user 0m1.218s
sys 0m0.799s
```
As another example, we compile an empty file with the standard prelude, then add `class Foo; end; Foo.new` to it and recompile. These are the times:
| | Cold cache | Before | After |
|-|-|-|-|
| Codegen (crystal) | 00:00:00.355781638 | 00:00:00.356948294 | 00:00:00.317487232 |
| Codegen (bc+obj) | 00:00:00.317895037 | 00:00:00.336335396 | 00:00:00.112650966 |
| Codegen (linking) | 00:00:00.198366216 | 00:00:00.181540888 | 00:00:00.169724852 |
| .o files reused | (none) | 165/312 | 315/318 |
For an even larger codebase, we try this modification in `src/compiler/crystal/codegen/codegen.cr` of the Crystal compiler itself:
```cr
module Crystal
class Foo
end
class Program
def run(code, filename : String? = nil, debug = Debug::Default)
Foo.new
# ...
end
end
end
```
The times are:
| | Cold cache | Before | After |
|-|-|-|-|
| Codegen (crystal) | 00:00:07.874727066 | 00:00:07.681402042 | 00:00:06.774886131 |
| Codegen (bc+obj) | 00:00:05.924711212 | 00:00:05.867124109 | 00:00:00.959626107 |
| Codegen (linking) | 00:00:04.859327472 | 00:00:04.942958413 | 00:00:04.937458099 |
| .o files reused | (none) | 850/2124 | 2102/2124 |
This will hopefully improve build times in certain scenarios, such as rapid prototyping, and IDE integrations that run the whole compiler.