Exploring the Compiler

tsornson · October 24, 2024, 2:42am

For my own education and fun times, I’m exploring the compiler, with a particular eye for incremental compilation. I don’t have a realistic expectation of delivering it, but I’m curious what obstacles I’ll run into along the way, and at the very least will have a deeper appreciation for the crystal core team :)

As a starting point, my first goal is to see if I could build something in my first point from this post, specificaly:

A new crystal tool that adds missing typing information to methods after type inference is complete (would operate similarly to the existing format crystal tool in my mind) to quickly add typing everywhere it’s needed.

For my trivial case, I’m using this basic crystal program as my proof-of-concept:

def hello!
    "Hello World!"
end

puts hello!

With the inferred type String as the intended type for that hello! method.

Using the good ol’ print debugging method, I’ve discovered:

The Program object is the container for all parsed and processed compiler data
Running program.semantic on the parsed code is where all of the semantic logic and inference gets applied.
After running semantic, there are two variables on it that contain references to the "hello! definition: defs and def_instances
program.defs (which is nillable) contains a reference to the parsed function definition → program.defs.not_nil!["hello!"]. This is actually an array of DefWithMetadata, with DefWithMetadata#def being the actual definition.
Since the function definition has no return type, program.defs.not_nil!["hello!"][0].def.return_type returns nil
program.def_instances contains the actual invocations / resolved definitions, but uses a DefInstanceKey as the key, which (among other properties, uses the def_object_id as the main part of the key.
This little blurb gets the resolved type for the hello! function:

# This is inefficient, but whatever
key = program.def_instances.keys.find! do |k| 
  k.def_object_id == program.defs.not_nil!["hello!"][0].def.object_id 
end
program.def_instances[key].return_type # => nil
program.def_instances[key].type # => String

Some additional experimentation along those lines shows the same type of inferencing and typing happens on the method arguments as well, woot! My next step will be a rudimentary tool as a first stab at the above quote block. As an aside, finding a way to serialize and deserialize from file the Program object is probably where incremental compilation would need to explore more.

I actually don’t know what I intended with this post - I think I started it as a place to post questions about the internals of the compiler, but as I typed them out, I came up with new experiments to learn those answers on my own. So now I’ll use it as a mini-brain dump and very rough “documentation” on how the compiler works for anyone else (including future me) to learn from.

Hope it helps someone!

sol.vin · October 24, 2024, 6:59pm

How would this tool be effected by macros? I see a potential problem if a macro adds a type/method/whatever that isnt actually in the user written code and generated at compile time. For example, a method might have an untyped argument, but a macro creates a type that ultimately ends up as the untyped argument. This would mean that the tool may try to fill in a type that won’t exist next compile…

tsornson · October 25, 2024, 3:13am

Admittedly, if your program relies on a non-deterministic type, then the tool that deterministically types your program probably isn’t a good fit for your workflow Though how this typer will behave with macros in general will be interesting experience, yeah.

There are definitely use cases where not typing a method provides more advantage than having it always typed, so one / some of the arguments will probably be filename / definition locations that should be typed instead, and those not specified will remain untouched.

nobodywasishere · October 25, 2024, 3:51am

This blog post is pretty insightful (along with the rest on the website).

When it comes to macros, there’s only so much we can do. At best, we can warn users that they may need to update stuff in macros, but honestly I wouldn’t worry about them or do anything to change them.

sol.vin · October 25, 2024, 5:36am

I just say this because sometimes these potential issues may not be apparent to the user. People use other peoples code, sometime as libraries, and I know I am more than guilty of not reading and understanding all the code present in a library. It’s entirely possible that several things could happen that might be transparent to the user, but mean a whole lot to something dealing with what types Crystal may be dealing with on the backend away from the user. I do actually want a tool like this, but at the same time I love to wonder what hiccups need to be addressed to take it on.

tsornson · October 27, 2024, 3:29am

So trying to build “The Tool” and running into a conundrum. I’ve been hacking within the compiler itself so far and adding said tool in parallel to the crystal tool format command. However, given its prototype-ness, I tried creating a new separate repo to house it so I could develop and ship it independently. Unfortunately, in that separate project, I’m running into the compiler error:

cc: error: /usr/share/crystal/src/llvm/ext/llvm_ext.o: No such file or directory

This looks like its coming from the requirement chain:

compiler/crystal/program -> llvm -> lib_llvm_ext

This looks a bit tricky - I think it’s only needed for codegen purposes, which I don’t intend to use, but copy / pasting it into this side project to get past this check might be easiest path forward.

That, or I guess “continue building within the compiler” for the time being is another option I think it would be awesome if this tool ended up within the compiler tools, since it ties in so much with the compiler itself, but that position would need to be earned, not granted.

straight-shoota · October 28, 2024, 9:19am

Yeah, LLVM is used to query some information for the semantic phase.

The most trivial option to remove the dependency on libllvm_ext is to install LLVM version 18 or more. The extension lib is only necessary for older versions.

For older versions you can run make deps in the compiler source tree.

tsornson · November 3, 2024, 9:12pm

Here’s version 0.1.0 of the creatively named cr-source-typer project:

It works on my machine, but buyer beware and all that :) If you try it and find issues, please let me know so I can improve it!

And another mini brain dump along the lines of the one that started this thread:

Running the semantic process has a cleanup property; setting this to true will cause it to expand the returned ASTNode to contain all required files and expanded macros. Setting it to false doesn’t.
After semantic has run, the program object has a types variable that’s a hash of the type name => Type instance representing that class / struct / whatever.
The types hash only contains the top level types - to get to ones within “namespaces” (or subclasses, etc.), a breadth-first-search expansion can be done, using each types’ own types variable to get the subtypes under it.
Similarly, the previously described defs and def_instances on the program object only contain the top level methods and its typed definitions. Use the Type#defs and Type#def_instances to get the methods for individual types (keeping in mind not every Type actually has or supports methods)
Static / class methods don’t exist on the Type directly (which contains instance level variables and methods), but instead on the *MetaclassType, which is a type specific to capturing class level information. Use type#metaclass to get a metaclass of a given type. This is the difference between String and String.class
*MetaclassType also use the defs and def_instances to store the class level methods (like new and allocate, by default).

nobodywasishere · November 4, 2024, 5:43am

This is really cool! I think this will be really useful. It’s something I’ve actually been wanting for a bit (see Add `crystal tool method_types` for listing method parameter types · Issue #14696 · crystal-lang/crystal · GitHub). It would be a really cool thing for tooling support / vscode extension.

tsornson · November 5, 2024, 5:46am

Oh cool! Yeah, a lot of overlap of what the source-typer does and that request - it wouldn’t be difficult to support a JSON output that contained all of the method types in the program. Though with this being able to add those types directly, is having that JSON output still useful?

nobodywasishere · November 5, 2024, 10:36am

JSON output could be useful for giving hints about changes instead of automatically doing them (among other tools). The easiest thing to do though would be to let it handle it directly after executing a command (in vscode, for example).

Hard part is trying to distribute it, which either means WASM, compiling on the person’s machine, or system / language package management. How hard do you think it could be to integrate back into crystal itself as a CLI tool? That’s the most ideal case right now (for distribution), though may be a pain. Otherwise I can look into packaging it as part of the extension itself.

tsornson · November 5, 2024, 3:43pm

It wouldn’t be difficult to roll this back into the compiler, given that’s where started until Straight-shoota helped me break it out above. It would let me get rid of some hacks I needed as well. I wanted a chance to find bugs and get it stabilized before proposing a PR, since it would become a new tool the core team would need to maintain, and I wanted that to be a successful conversation

tsornson · November 10, 2024, 4:20am

Uploaded version 0.2.1, which supports adding type restrictions to splats and double splats. Also found and fixed a bug with VirtualTypes being a return type of a method caused the trailing + to show up at the end.

zw963 · November 13, 2024, 6:02pm

Oops, i added all type manually to my 2000 LOC shard recent days … this isn’t a very happy things …

zw963 · November 13, 2024, 6:08pm

Hi, it not works on my shards, could you please have a look?

 ╰──➤ $ bin/typer src/procodile.cr
Unhandled exception: Element not found (Enumerable::NotFoundError)
  from /home/zw963/Crystal/share/crystal/src/enumerable.cr:555:5 in 'push_instance'
  from lib/source-typer/src/source_typer.cr:109:11 in 'accepted_def_instances'
  from lib/source-typer/src/source_typer.cr:136:22 in 'init_signatures'
  from lib/source-typer/src/source_typer.cr:34:5 in 'run'
  from lib/source-typer/src/cli.cr:12:1 in '__crystal_main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:118:5 in 'main_user_code'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:104:7 in 'main'
  from /home/zw963/Crystal/share/crystal/src/crystal/main.cr:130:3 in 'main'
  from /usr/lib/libc.so.6 in '??'
  from /usr/lib/libc.so.6 in '__libc_start_main'
  from bin/typer in '_start'
  from ???

you can reproduce on add_source_typer branch.

Thanks

tsornson · November 13, 2024, 8:15pm

Thanks for the report! Yeah, will give that a try later tonight.

tsornson · November 14, 2024, 4:51am

Whelp, this is embarrassing This particular bug had been fixed, but I hadn’t pushed the new version to github yet. Say hello to 0.2.2!

That being said, I:

Cloned procodile
Ran all of the tests: make test (successful)
Ran the typer command on it: ./bin/typer src/procodile.cr (successful, results set up as PR here)
Ran tests again, failed

The resulting compilation error is:

In src/procodile/cli.cr:169:7

 169 | def initialize(@name : String, @description : String, @options : Proc(OptionParser, Procodile::CLI, Nil), @callable : Proc(Nil) | Proc(NoReturn))
       ^
Error: expected argument 'callable' to 'Procodile::CLI::Command#initialize' to be (Proc(Nil) | Proc(NoReturn)), not (Proc(Nil) | Proc(NoReturn))

Overloads are:
 - Procodile::CLI::Command#initialize(name : String, description : String, options : Proc(OptionParser, Procodile::CLI, Nil), callable : Proc(Nil) | Proc(NoReturn))

I’m confused here, because I can’t see what the difference is between those two type restrictions :/ Anyone spot the difference?

tsornson · November 14, 2024, 5:03am

Figured the above out, but it’s still strange. The callable property actually has a type restriction defined two lines above in a crystal getter, set as Proc(Nil). Somehow Proc(NoReturn) came from somewhere and got inferred as a type from somewhere.

~~The error message wasn’t helpful in identifying this, either Any thoughts welcome.~~

EDIT: found / rediscovered this documentation: NoReturn - Crystal 1.14.0. Interesting use case. Neat.

zw963 · November 14, 2024, 10:32am

Thanks, this tool help me a lot, i tried version 0.2.2, it helped me discover a few missed type declarations.

For the NoReturn, the doc said:

NoReturn can be explicitly set as return type of a method or function definition but will usually be inferred by the compiler.

The only thing need to do is, fix several NoReturn added by this tool to correct type.

tsornson · November 21, 2024, 5:36am

Created an initial pull request adding the source-typer tool to the compiler created here: Add Source Code Typing Tool by Vici37 · Pull Request #15211 · crystal-lang/crystal · GitHub

Some other thoughts / comments that aren’t directly related to this tool, but discovered along the way of building it:

An implicit require "prelude" gets inserted into the beginning of all crystal programs, and will either load this file, or a different crystal file if the --prelude <new-prelude> build option is provided.
Prelude is responsible for “filling out” all of the methods and behaviors of the base types of the language (such as the + operator for Int32).
Prelude isn’t cheap - when compiling a puts "hello world!" crystal file, running the semantic on the prelude takes about 1.5 seconds out of the 2 seconds total for building (my computer is a bit of a potato)
When running program.semantic, it’s typical (traditional?) to put all parsed (your file) and constructed (require "prelude") ASTNodes into a single expression and then run semantic on that in a single pass
I don’t think this is a required operation, at least I seemed to be able to run semantic on the require "prelude" and then run semantic on whatever the entrypoint file might be, without compiler / semantic errors being thrown (didn’t test the codegen, admittedly)
If the Program object could be serialized, then serializing whatever the result is of running semantic on prelude and packaging that into the compiler itself could potentially seriously shorten compile times for smaller programs
A potential POC for this could be with the crystal playground, where a semantic on prelude could be pre-run while the user is entering in crystal code. When the user clicks “run” or whatever, it should almost immediately return results (and in the background could preload another new Program with semantic running on prelude)

Might be the next random weekend project to try :) I’m having a lot of fun digging into the internals of the compiler! Well done Crystal Team!

Topic		Replies	Views
Incremental compilation exploration Crystal Contrib	114	4466	February 18, 2025
Crystal 1.5.0 has been released! Official release	6	503	July 8, 2022
How to visit class and method defs with expanded macros by parser/compiler? Help & Support	6	766	August 8, 2019
[Mini Review] Giving up on Crystal	14	11070	March 24, 2019
Crystal 1.4.0 has been released! Official	5	673	April 11, 2022

Exploring the Compiler

Related topics