Exploring the Compiler

For my own education and fun times, I’m exploring the compiler, with a particular eye for incremental compilation. I don’t have a realistic expectation of delivering it, but I’m curious what obstacles I’ll run into along the way, and at the very least will have a deeper appreciation for the crystal core team :)

As a starting point, my first goal is to see if I could build something in my first point from this post, specificaly:

A new crystal tool that adds missing typing information to methods after type inference is complete (would operate similarly to the existing format crystal tool in my mind) to quickly add typing everywhere it’s needed.

For my trivial case, I’m using this basic crystal program as my proof-of-concept:

def hello!
    "Hello World!"
end

puts hello!

With the inferred type String as the intended type for that hello! method.

Using the good ol’ print debugging method, I’ve discovered:

  • The Program object is the container for all parsed and processed compiler data
  • Running program.semantic on the parsed code is where all of the semantic logic and inference gets applied.
  • After running semantic, there are two variables on it that contain references to the "hello! definition: defs and def_instances
  • program.defs (which is nillable) contains a reference to the parsed function definition → program.defs.not_nil!["hello!"]. This is actually an array of DefWithMetadata, with DefWithMetadata#def being the actual definition.
  • Since the function definition has no return type, program.defs.not_nil!["hello!"][0].def.return_type returns nil
  • program.def_instances contains the actual invocations / resolved definitions, but uses a DefInstanceKey as the key, which (among other properties, uses the def_object_id as the main part of the key.
  • This little blurb gets the resolved type for the hello! function:
# This is inefficient, but whatever
key = program.def_instances.keys.find! do |k| 
  k.def_object_id == program.defs.not_nil!["hello!"][0].def.object_id 
end
program.def_instances[key].return_type # => nil
program.def_instances[key].type # => String

Some additional experimentation along those lines shows the same type of inferencing and typing happens on the method arguments as well, woot! My next step will be a rudimentary tool as a first stab at the above quote block. As an aside, finding a way to serialize and deserialize from file the Program object is probably where incremental compilation would need to explore more.

I actually don’t know what I intended with this post - I think I started it as a place to post questions about the internals of the compiler, but as I typed them out, I came up with new experiments to learn those answers on my own. So now I’ll use it as a mini-brain dump and very rough “documentation” on how the compiler works for anyone else (including future me) to learn from.

Hope it helps someone! :grin:

6 Likes

How would this tool be effected by macros? I see a potential problem if a macro adds a type/method/whatever that isnt actually in the user written code and generated at compile time. For example, a method might have an untyped argument, but a macro creates a type that ultimately ends up as the untyped argument. This would mean that the tool may try to fill in a type that won’t exist next compile…

Admittedly, if your program relies on a non-deterministic type, then the tool that deterministically types your program probably isn’t a good fit for your workflow :stuck_out_tongue: Though how this typer will behave with macros in general will be interesting experience, yeah.

There are definitely use cases where not typing a method provides more advantage than having it always typed, so one / some of the arguments will probably be filename / definition locations that should be typed instead, and those not specified will remain untouched.

This blog post is pretty insightful (along with the rest on the website).

When it comes to macros, there’s only so much we can do. At best, we can warn users that they may need to update stuff in macros, but honestly I wouldn’t worry about them or do anything to change them.

1 Like

I just say this because sometimes these potential issues may not be apparent to the user. People use other peoples code, sometime as libraries, and I know I am more than guilty of not reading and understanding all the code present in a library. It’s entirely possible that several things could happen that might be transparent to the user, but mean a whole lot to something dealing with what types Crystal may be dealing with on the backend away from the user. I do actually want a tool like this, but at the same time I love to wonder what hiccups need to be addressed to take it on.

1 Like

So trying to build “The Tool” :tm: and running into a conundrum. I’ve been hacking within the compiler itself so far and adding said tool in parallel to the crystal tool format command. However, given its prototype-ness, I tried creating a new separate repo to house it so I could develop and ship it independently. Unfortunately, in that separate project, I’m running into the compiler error:

cc: error: /usr/share/crystal/src/llvm/ext/llvm_ext.o: No such file or directory

This looks like its coming from the requirement chain:

compiler/crystal/program -> llvm -> lib_llvm_ext

This looks a bit tricky - I think it’s only needed for codegen purposes, which I don’t intend to use, but copy / pasting it into this side project to get past this check might be easiest path forward.

That, or I guess “continue building within the compiler” for the time being is another option :smile: I think it would be awesome if this tool ended up within the compiler tools, since it ties in so much with the compiler itself, but that position would need to be earned, not granted.

Yeah, LLVM is used to query some information for the semantic phase.

The most trivial option to remove the dependency on libllvm_ext is to install LLVM version 18 or more. The extension lib is only necessary for older versions.

For older versions you can run make deps in the compiler source tree.

2 Likes

Here’s version 0.1.0 of the creatively named cr-source-typer project:

It works on my machine, but buyer beware and all that :) If you try it and find issues, please let me know so I can improve it!

And another mini brain dump along the lines of the one that started this thread:

  • Running the semantic process has a cleanup property; setting this to true will cause it to expand the returned ASTNode to contain all required files and expanded macros. Setting it to false doesn’t.
  • After semantic has run, the program object has a types variable that’s a hash of the type name => Type instance representing that class / struct / whatever.
  • The types hash only contains the top level types - to get to ones within “namespaces” (or subclasses, etc.), a breadth-first-search expansion can be done, using each types’ own types variable to get the subtypes under it.
  • Similarly, the previously described defs and def_instances on the program object only contain the top level methods and its typed definitions. Use the Type#defs and Type#def_instances to get the methods for individual types (keeping in mind not every Type actually has or supports methods)
  • Static / class methods don’t exist on the Type directly (which contains instance level variables and methods), but instead on the *MetaclassType, which is a type specific to capturing class level information. Use type#metaclass to get a metaclass of a given type. This is the difference between String and String.class
  • *MetaclassType also use the defs and def_instances to store the class level methods (like new and allocate, by default).
4 Likes

This is really cool! I think this will be really useful. It’s something I’ve actually been wanting for a bit (see Add `crystal tool method_types` for listing method parameter types · Issue #14696 · crystal-lang/crystal · GitHub). It would be a really cool thing for tooling support / vscode extension.

2 Likes

Oh cool! Yeah, a lot of overlap of what the source-typer does and that request - it wouldn’t be difficult to support a JSON output that contained all of the method types in the program. Though with this being able to add those types directly, is having that JSON output still useful?

1 Like

JSON output could be useful for giving hints about changes instead of automatically doing them (among other tools). The easiest thing to do though would be to let it handle it directly after executing a command (in vscode, for example).

Hard part is trying to distribute it, which either means WASM, compiling on the person’s machine, or system / language package management. How hard do you think it could be to integrate back into crystal itself as a CLI tool? That’s the most ideal case right now (for distribution), though may be a pain. Otherwise I can look into packaging it as part of the extension itself.

It wouldn’t be difficult to roll this back into the compiler, given that’s where started until Straight-shoota helped me break it out above. It would let me get rid of some hacks I needed as well. I wanted a chance to find bugs and get it stabilized before proposing a PR, since it would become a new tool the core team would need to maintain, and I wanted that to be a successful conversation :sweat_smile:

2 Likes