Eagle view on how the compiler works?

Would it be possible to have an eagle view about how does the compiler work? I mean, a high-level workflow from having a bunch of files as source code, to the production of a native binary (but not below LLVM).

To give you some context, in this post I wrote something like that for Elixir (that is a summary of a talk of mine in a past ElixirConf EU).

Is the lexer/parser written by hand? How is execution of top-level code at compilation time solved? Who does the work when you pass --release? When are macros expanded (do you use that verb)? Does Crystal generate LLVM IR and passes that down using some API? Who generates the actual machine code? (I have 0 knowledge of LLVM.)

1 Like

https://github.com/crystal-lang/crystal/wiki/Compiler-internals would be a good starting point.

3 Likes

These are two presentations I gave at Manas some years ago:

We should have recorded those… they were spoken in Spanish but they would have been useful in this case :slight_smile:

Then some answers to your questions:

Yes, because we started the project in Ruby. Had we used a parser generator we would need to have a tool like that for Crystal when we translated it for Crystal… which didn’t exist at that point (that is, Crystal!). Also, hand-written parsers and lexers are easier to understand for me, and more customizable.

It’s interpreted. There’s an interpreter.cr file in the repo.

LLVM. We just pass something similar to -O2 to it. Crystal does no optimizations at all when you pass --release.

This has a complex answer, but it’s explained in one of the slides I linked.

The LLVM C API let’s you programmatically generate LLVM IR (that is, you don’t have to concatenate strings to build the IR). There’s an LLVM module in the standard library (poorly documented).

LLVM.

I think in one of the slides I also briefly talk about LLVM. Also in a recent remote talk I gave.

5 Likes

Fantastic!

@asterite if you want to give those talks again I can set that up and record it.

Mmm… I don’t think it’s interesting enough. Most probably want talks about using the language, or features, shards, etc. If you learn how the compiler works there’s not much you can do with that.

I think both topics are interesting. If you enjoy the subject and want to give a talk on it I will try and host it and record it. Almost everyone that came to your Chicago Crystal presentation talked about it later and enjoyed it. However I totally understand if you would rather work on making the compiler better than giving a talk on it :)

Is method and constant lookup solved at compile time so that at runtime things point directly to the exact spot? Last slides in the 2nd presentation above seem to suggest that, but would like to confirm.

Yes, but for methods a call might find multiple matching overloads. Then at runtime the overload is chosen based on the runtime types. This is called multidispatch.

2 Likes

As in

def f(x : Int32)
  :Int32
end

def f(x : String)
  :String
end

def g(x : Int32 | String)
  f(x)
end

g(0) # :Int32

for example?

Is the following dispatch resolved at compile or runtime?

class A
end

class B < A
end

class C < B
end

def f(a : A)
  :A
end

def f(c : C)
  :C
end

f(A.new) # :A
f(B.new) # :A
f(C.new) # :C

All those examples are solved at compile-time. You are passing one specific type and there’s only one overload that matches.

Multidispatch happens when the type consists of multiple types, such as a union type or a parent type. For example:

def foo(x : Int32)
  puts "int"
end

def foo(x : String)
  puts "string"
end

ary = [1, "foo"].shuffle
ary.each { |e| foo(e) }

Here the type of e is Int32 | String and depending on the actual type at runtime one of the two foo will be called. I used shuffle here so it’s clear that the values are shuffled at runtime so the compiler has no way to know which overload to call at compile-time.

This is similar:

class Foo
  def call
    puts "Foo"
  end
end

class Bar < Foo
  def call
    puts "Bar"
  end
end

ary = [Foo.new, Bar.new].shuffle
ary.each { |e| e.call }

The difference here is that the type of e is just Foo. Well, actually for the compiler it’s Foo+, which means Foo or any of its subclasses, and the dispatch happens at runtime.

5 Likes

I see! Thank you Ary!