Propose: Incremental compilation

duplicate from github here:

every file can be compiled into a compilation unit of bytecode,
let’s call it .crgb, (.crc well is crc…), so I choose to use *.crb here
which means crystal bytecode, then *.crgb means crystal general
bytecode. the gb is kind of same of *.cr, it only includes the prototype of function.

#sum.cr
def sum(a,b)
  a+b
end

then let’s say foo.cr depends on sum.cr

#foo.cr
sum(1,2)

crystal foo.cr, the compiler process will fetch foo.{cr,crgb} and sum.crgb to produce final result,
no need to recompile sum.cr. when sum(1,2), compiler realize it needs to instantiate the
function to a sum(Int32,Int32) instance, it will do it, whether it choose to save the bytecode
to sum.crb that’s an another choice.

#foo.cr
sum("a","b")

crystal foo.cr, the compiler process will fetch foo.{cr,crgb} and sum.crgb to produce final result,
no need to recompile sum.cr. when sum(1,2), compiler realize it needs to instantiate the
function to a sum(String,String) instance, it will do it.

step 1 → *.crgb is already performance improvement and I think can already handle very large projects.

step 2–> *.crb is problematic because we don’t know beforehand all the instantiations, there may be
some optimization in the future to clever knows a *.crb doesn’t needs to recompile. for now let’s just only
compile to *.crgb

You may want to check out the thread that was previously started about incremental compilation.

That method isn’t going to be compiled multiple times anyway. The compiler creates LLVM IR rather quickly, then merges it into a single LLVM module before compiling it to machine code.

I don’t quite understand you, compiler compiling into llvm ir,
what I’m talking about bytecode is not llvm ir,
I mean crystal should define bytecode/VM itself,
ofc this would mean great changes in crystal, maybe go into crystal 2.0,

saves the time lexer–>parser–>ast–>*.crgb

*.crgb is where this crystal defined its own bytecode.

With your idea, would those bytecodes be converted to LLVM-IR at the end?

Hello, crystal 2.0 would have its own VM,so its own bytecode.
just like Ruby/Python/Java did.

then you can do optimization based on VM level.
bytecode → machine code compilation if hotspot(or all program if you required),
if in the middle bytecode-> llvm ir is easier to implement then it is.

I’d be very strongly opposed to crystal having a VM requirement for all of it’s execution, beyond the operations of the interpretor. With Ruby/Python/Java, I can’t execute programs written by them without some sort of runtime pre installed for them. With Crystal’s ability to compile down to native binary, that binary can be freely distributed and used by those who don’t have crystal installed.

IIRC from that excellent thread that was linked, the cost of the incremental compilation is due to the type system possibly changing drastically due to a cascade effect by changing a single method signature, which requires the compiler to revisit each bit of code again. In your example above, what ensures that’s whatever passed to the sum method has a + operator to begin with, if we don’t continuously recheck the body of that method?

1 Like

I understand the idea. The idea of a “type definition-free” Crystal-style compiled library is exciting.

However, if it were to be done without LLVM-IR, it would not be Crystal 2.0, but a different language implementation, such as JRuby for CRuby.

I also feel that this proposal is larger than the impression one generally gets from the title “incremental compilation”.

1 Like

What makes more sense to me is researching if it’s possible ot make lli work for Crystal. It’s LLVM’s interpreter. I tried briefly and couldn’t get it to work, but maybe someone with more time and will can make it work?

4 Likes

@beta-ziliani basically I’m proposing a multi-level bytecode hierarchy, (borrowed idea from mlir )
*.crgb level 1
*.crb level 2
llvm ir level 3

This also solves the problem captured block has to specify type declaration,
currently user has to distinguish between captured block and not captured block,
It’s mental burden for user to remember there’s a difference between captured block and not captured block.
Propose: compile proc to prototype bytecode(*.crgb level if no type information gives), when real call instantiate it.

I have essentially zero experience in this area, but it feels to me that would be a lot of work, complexity, and, overhead. IMO i’d rather see that effort put towards other things that would be more impactful than just making it so you don’t have to type captured blocks…

after second consideration, maybe 3 levels are not needed,

if you took compiler approach:
*.crgb level1
llvm ir level 2

If you took VM approach:
*.crgb level 1
*.crb level 2
llvm ir level 3

I can’t imagine anyone in the community wanting a VM for Crystal. That just goes directly against the goals of the language that it should Compile to efficient native code. Having a VM would make that no longer true afaik.

And my point still stands. Seems like a lot of effort to just make it so you don’t have to type captured blocks…

3 Likes

yeah, but supporting Incremental compilation is still a valid use case.

EDIT: found this discussion about performance:

Java Runtime Performance Vs Native C / C++ Code? - Stack Overflow

It so easy to distinguish it?

For users already accustomed to crystal, they already know the difference,
so they probably don’t care.

For users new to crystal, take me as an example, it’s just confusing
and will make mistakes(I did make mistakes). So for new users,
if there’s no difference, then that’s better. (underlying crystal may
have different optimization between captured or not , but that
does not concern user and should not leak into user space)

also unifying then can make a step closer to let captured block supports break,next etc,
I believe currently captured block doesn’t supports break,next?

Use &block within method body, will treat as captured block, otherwise not, it very easy to distinguish, anyhow, learn a static compiled language have learn cost.

I believe currently captured block doesn’t supports break,next?

Support next

1 Like

If the VM was built into the produced executable? I don’t know if bytecode could be “recompiled” into proper efficient native code, but if it could provide fast (re-)compiles and speedy specs, a performance hit while developing could be acceptable to a lot of people. But, I have no idea of the difficulty of these things, so feel free to ignore me.

1 Like

Just checked, the erlang has a VM and erlang runs very fast.

I think the need for crystal specific bytecode is inevitable.
no matter crystal using compile or VM.
maybe crystal 2.0 stills uses compiling, but crystal 3.0 has a VM.

It supports 1. incremental compilation
2.dynamic linking/loading another crystal artifact, I believe currently
when you write a lib, others need to have your source code
under their lib and compiled as a whoe?

Yeah, Erlang has a VM called BEAM. Elixir also targets BEAM iirc.

The problem is it’s hard to define “very fast”. Very fast execution time? Fast GC? Fast to start? Fast to write?

If you move Crystal to bytecode that’s executed on a VM, it will be slower in execution time in many ways. There might be some cases where JIT could probably win out (assuming this hypothetical VM has JIT capabilities), but in general the code won’t execute as fast as the native code it produces now. Also remember that performance isn’t everything. Sometimes memory usage is important. Or safety. Or a bit of everything to some degree. VMs aren’t the silver-bullet answer to any of these.

Here are some benchmarks I run for fun whenever a language I follow has a new release. The languages/implementations that use a VM (some also use JIT) here are Clisp, C#, Lua, and LuaJIT. All of these “feel” fast for everyday tasks. The languages that compile to native code are C, Crystal, SBCL, CCL, Chez, Vala, Go, and FreePascal. Note which ones tend to be faster as far as execution time (ignoring ClozureCL lol).

--== BinaryTrees ==--
C      : == 33.8MB     00:00:02.077832304 ==
Crystal: == 48.9MB     00:00:01.367842335 ==
Sbcl   : == 100MB      00:00:00.569982422 ==
Ccl    : == 70.5MB     00:00:02.293936612 ==
Clisp  : == 46.9MB     00:00:17.227013321 ==
Chez   : == 94.2MB     00:00:00.523658229 ==
Pascal : == 64.1MB     00:00:05.213451451 ==
Csharp : == 70.3MB     00:00:01.094515756 ==
Vala   : == 31.2MB     00:00:04.581889339 ==
Go     : == 30.0MB     00:00:02.107452290 ==
Luajit : == 630MB      00:00:15.322043723 ==
Lua    : == 1.19GB     00:00:54.972512376 ==

--== Mandelbrot ==--
C      : == 1.11MB     00:00:06.173897144 ==
Crystal: == 3.15MB     00:00:06.253347560 ==
Sbcl   : == 13.1MB     00:00:07.079360982 ==
Ccl    : == 15.3MB     00:00:23.473544119 ==
Clisp  : == 9.01MB     00:17:14.722525261 ==
Chez   : == 29.5MB     00:00:57.551540460 ==
Pascal : == 160KB      00:00:10.328966159 ==
Csharp : == 14.5MB     00:00:11.484398139 ==
Vala   : == 2.72MB     00:00:06.176757542 ==
Go     : == 1.1MB      00:00:06.420972850 ==

--== nBody ==--
C      : == 1.45MB     00:00:00.527521874 ==
Crystal: == 3.3MB      00:00:00.795642814 ==
Sbcl   : == 12.8MB     00:00:01.213863330 ==
Ccl    : == 18.2MB     00:00:16.042523018 ==
Clisp  : == 8.93MB     00:03:16.288562580 ==
Pascal : == 160KB      00:00:00.993175458 ==
Csharp : == 15.8MB     00:00:01.617645118 ==
Vala   : == 2.92MB     00:00:02.059453463 ==
Go     : == 1.4MB      00:00:00.812265524 ==

--== SpectralNorm ==--
C      : == 1.35MB     00:00:00.877179948 ==
Crystal: == 4.06MB     00:00:01.238123371 ==
Sbcl   : == 13.1MB     00:00:02.351707795 ==
Ccl    : == 16.4MB     00:00:11.966900923 ==
Clisp  : == 8.93MB     00:08:23.072408225 ==
Chez   : == 29.5MB     00:00:18.400959997 ==
Pascal : == 160KB      00:00:05.078158506 ==
Csharp : == 13.4MB     00:00:12.116257471 ==
Vala   : == 2.99MB     00:00:01.314162542 ==
Go     : == 2.97MB     00:00:01.747539088 ==
Luajit : == 3.78MB     00:00:01.569396586 ==
Lua    : == 4.33MB     00:01:03.780537925 ==

Geometric Means:
  C      : 1.560898 seconds,   0.000 Joules (from 4 benchmark runs)
  Crystal: 1.703757 seconds,   0.000 Joules (from 4 benchmark runs)
  Lua    : 63.173521 seconds,   0.000 Joules (from 3 benchmark runs)
  Sbcl   : 1.842265 seconds,   0.000 Joules (from 4 benchmark runs)
  Ccl    : 10.083321 seconds,   0.000 Joules (from 4 benchmark runs)
  Clisp  : 204.828246 seconds,   0.000 Joules (from 4 benchmark runs)
  Chez   : 8.215773 seconds,   0.000 Joules (from 3 benchmark runs)
  Pascal : 4.059556 seconds,   0.000 Joules (from 4 benchmark runs)
  Csharp : 3.961826 seconds,   0.000 Joules (from 4 benchmark runs)
  Vala   : 2.958363 seconds,   0.000 Joules (from 4 benchmark runs)
  Go     : 2.093489 seconds,   0.000 Joules (from 4 benchmark runs)
  Luajit : 4.903709 seconds,   0.000 Joules (from 2 benchmark runs)
Overall geometric mean time based on all recorded runs: 5.784682 seconds
Total time of all results: 2192.788479 seconds

(The geometric means for ccl, clisp, chez, and lua, an luajit are off since they aren’t in every benchmark)

The move to a VM would be a huge step back for Crystal IMO. The interpreter we have now is nice for debugging or for checking code, but I wouldn’t want it to be the default or the only way to run code. The main reasons I use Crystal are that it’s an easy language to learn, it has a solid standard library, and it compiles to super fast native code that doesn’t depend on a VM.

6 Likes

Really impressive.

Crystal beats all reference languages except C for performance, not surprising.

But, go (I hate it, because it is 20 years backward compared to other languages), beat C in memory usage beat 3 of 4 test, surprising!

2 Likes