Incremental compilation exploration

Happy new year!

I believe that, at some point, incremental compilation will have to be introduced to Crystal in order for it to keep growing.

I started exploring this problem, just a bit. I wrote a blog post about it: Incremental compilation for Crystal - Part 1 - DEV Community 👩‍💻👨‍💻

35 Likes

I love that you made this post! I was just working on a little post asking about updates and conversations we should be having for v2.

2 Likes

The main thing that I would like to say is that I would be up for any and all breaking changes that focus on compilation speed and LSP support.

  • Removing parameter and return type inferencing
  • Replacing require
  • ???

I get that it’s probably already too late for v2, but I just don’t want these problems to continue to harden and turn into foundational flaws of the language.

4 Likes

Definitely!

The questions we need to ask and answer (the core team, but mainly the community) is what’s the appeal of Crystal. Is it mainly the standard library? Is it being able to prototype without specifying types in methods? Is it the concurrency model? Is it macros and compile-time reflection? What if we require you to require all file dependencies up-front?

We should try making these changes with the goal of allowing faster compile times, but also without turning the language into Go, Rust, or another language. Otherwise Crystal wouldn’t have anything special compared to other languages.

13 Likes

It’s something so obvious that I think someone has already said it before, but can we leverage the type inference process by using tool? Like currently most editors support format on saving, which in turn just call the formatter for that specified language, we should be able to insert the inferred types to the current saving file, which will still can make user happy without forcing them to manually insert all the type annotations.

For methods that can not be inferred by the current state of code, we can put a _ as placeholder. All methods that have _ as return type, or are generic will have special treatment, otherwise we can safely assume that the return type of the methods in other files are correct, and treat them as such.

This might not improve the incremental compilation situation in near future, but I think that tools like LSP can take advance of it to improve the developer experience.

1 Like
  • Removing parameter and return type inferencing

This I’m skeptical about as a whole lot of the niceness of crystal lies in the ergonomics, and not having to specify types in these places add very much to that. There are also some places where return types in the crystal src dir have intentionally not been added as they was too restrictive or complex.

  • Replacing require

This, I wouldn’t mind as much, as the amount of requires is not all that high anyhow, and also doesn’t impact coding flow very much. Also, the way require work is one of the weaker parts of ruby anyhow. Not certain if I’d prefer to state dependencies on a file by file basis or in a central place for each app target/shard.

What I wonder about though is if there would be anything to gain by doing partial type checking first?
That is, parse and type as much of a file as possible, and then record what needs to be filled in later by other files. This added step would be very cacheable, as if a file hasn’t changed the partial type check would stay the same. To resolve the unknowns two files (or whatever the unit would be - it is not obvious it should be source files, it could perhaps also be depend on just a single method definition inside a file?) would be unified, based on the dependency chart. The unification of two entities could also be a partial and cached, if either depends on something else.

Pros:

  • In theory, both the initial partial parse and any unifications should be possible to both cache and parallelize to a degree that depend on how the dependency chart looks like.
  • core and stdlib could come pre-unified as far as possible out of the box

Cons

  • Not obvious how to represent two arbitrary unified pieces of code in a way that is both repeatable and where the cache will be busted if any dependency is changed.
  • Not certain it is a con but it would mean not only what is called would be type checked.
  • Monkey patching might get even harder to reason about as there either need to be some sort of cache invalidation or possible to unify into something that modifies an earlier result as opposed to just extend it.
  • Nothing says it would actually be faster - it would be another step added and it is potential caching and parallelization that would give speedups. But how much would have to be done in sequence anyhow? There would definitely be more work that needs to be done.
  • Finding good sized chunks to parallelize and how to prioritize what to do first could also be hard?

How feasible this idea is? No idea. It would definitely be a very big departure from how the current type checking code is done. I’m throwing a very much unbaked idea out there.

3 Likes

I don’t think it’s feasible to do that. From what I understand you’re basically suggesting the formatter should insert inferred type information into the source code (regardless of whether that’s actually the formatter or called something else).
The problem with that is that inferred information depends on how the code is used in this specific case. If I have two different entry points using the same library code but each in a different way, each of them would result in different inferred information.

1 Like

This is also the most important argument for never dropping type inference from the language, right?

3 Likes

Following is my point of view for the asterite old 2.0 wishlist

Instead of require merely loading files, change it, or introduce an import keyword to specify what you want to import. If you don’t import something, it’s not globally available (unlike the current behavior)

It sounds useful.


Mandatory types in method arguments and return types

We can add a compiler toggle, for this?


Disallow reopening types, but allow something similar by having extension methods (similar to C#)

I consider this is the spirit of Crystal?


Type restrictions in generic type arguments

It sounds useful.


Macros can no longer introspect the entire program. They can still introspect types

Can we still use feature like @type.methods ? if not, not good.

1 Like

That’s why I suggested using _ (or Any/Auto type for more explicit) for methods that can’t/shouldn’t be inferred.

It doesn’t need to be fully typed just yet, we just add the type annotations gradually. Also, the first iterations of the tool should only attempt to insert the types on the current working file, so that the developer can manually fix the errors of the tool.

For more advanced uses, like the one you probably thought, then I think instead of adding the type annotations directly to the current files, we just create a mirror copy of the files (with all type information filled) unique to each entry point (the compiler target) which should guaranteed to have enough type information in order to be compiled.
I don’t know how applicable it yet (maybe the cache folder will blow up) but it should solve the problem of conflicting inferred type information.

The type inference should never been dropped. Even you can automatic insert the type annotation using tool (which still utilize type inference anyway), you will still need it to check the correction of the type assertion.

Also in some case when the type can be verbose (like the recursion ones) or is hidden (like String::Iterator) then it is better to leave it to the compiler.

The second part is up: Incremental compilation for Crystal - Part 2 - DEV Community 👩‍💻👨‍💻

10 Likes

Caveat: I’m probably a bit of an oddball here in that I don’t really use LSPs because I’ve found them too annoying to set up in the past, and just didn’t find them to be too useful. As long as I have syntax highlighting in Emacs, that’s good enough for me. I think the closest thing to an LSP that I use regularly is Slime, but that is Emacs- and language-specific, and doesn’t use the same protocol.

Anyway… been thinking about this thread all day and thought I’d provide my own thoughts and experiences on some of the questions brought up by Asterite.

Is it mainly the standard library?

The standard library is easily one of my favorite parts of Crystal. It covers enough bases to get me started with most things that I work on, and I like the design of some of its modules (the JSON/YAML parsing especially). There’s always room for improvement and expansion, but it’s definitely one of the nicer standard libs that I’ve used.

Is it being able to prototype without specifying types in methods?
Removing parameter and return type inferencing

This is part of what makes Crystal feel so special to me, and part of the initial appeal that led me to try it out. It can feel like a dynamic language, but I can tighten down the types of variables later on as-needed. This is exactly what I do in the other language I normally work, Common Lisp, which has ways to optionally specify the types of things.

Not having to specify types in Crystal speeds up my coding, especially when doing experiments. But again, I also like that I can specify types in order to catch errors that would otherwise be obscure later on. Like when I wasted three days trying to track a bug down in my port of Doom to Crystal, only to find it was a single typo where I unknowingly switched an int to an int|float union because I didn’t specify a type.

Replacing require

I never realized exactly everything that require does until I read part 1 of Asterite’s post. I just figured the compiler was smart enough to pull in what a file needed and never considered how it was done. It’s quite nice to not have to specify everything, though.

I don’t compile an entire project from scratch all too often, especially not in release mode. What I instead find myself doing most often is writing a small temporary program in the root of the project, then requiring a specific file from within my source tree to experiment with something within it. This is where require ends up helping me. It lets me subdivide my program during development.

Is it the concurrency model?

The way Crystal approaches concurrency is quite nice IMO. But it isn’t a game changer for me, and it doesn’t feel too much like a feature unique to Crystal. It simply feels nicely done.

Is it macros and compile-time reflection?

The macro system in Crystal is a lifesaver. The syntax can get a bit unwieldy, at times, as can the error messages, but those are minor issues. This is my other favorite part of Crystal, next to the standard library. Perhaps that’s the Lisper in me speaking, though :stuck_out_tongue:

Overall, I see incremental compilation as sort of a no-win-but-no-loss thing for me, at least in the way it’s been proposed and discussed. Some of my larger projects (20k - 40k loc, accounting for libraries) could potentially benefit from it, but likely not that much since I don’t actually compile things too often, as mentioned before. What would be beneficial is being able to compile a library into an actual shared library object in a way more akin to C/C++ or .NET. That sort of incremental compilation is something I could get behind, even if it meant trading resulting object code size (I’m already used to binaries that are 50mb in the Lisp world).

Or, if not actual shared libraries, then maybe something where intermediate code gets stored per-source-file in a special binary format together with extra debugging info, and then the compiler can pull things from it as-needed. This is sort of what Common Lisp and Slime do with fasl files if I’m not mistaken, where the fasls store both the machine code and debug info, including location data to do stuff like code lookups. Maybe something along those lines could help with LSP stuff?

But… I hear that shared libraries are probably a no-go. So given that, I think reducing the memory usage of the compiler, or maybe finding more ways to parallelize it, would be a much better way to improve Crystal than incremental compilation.

6 Likes

That would be nice. Maybe it would be possible to way to have a “library Light(no sugar)” system?
More like loading plugins than a true shared library.
If such a thing would be possible, you could skip the compilation of the code for that part (if it has not changed)

I have only a vague idea of why libraries won’t work, so this might be just nonsense :smiley:

1 Like

Is it mainly the standard library?

I love the standard library, it’s soo good. Any efforts to improve its weak points and make it even better is appreciated.

Is it being able to prototype without specifying types in methods?

While this is nice, I usually end up sticking types in method definitions as I think it’s a nice boundary for surfacing type errors. Still, I think I would prefer this to stay.

Is it the concurrency model?

I haven’t really done much with concurrency, but I feel like there is a significant amount of repeated code when it comes to working with fibres and channels. Like it would be nice if I didn’t have to do all the plumbing myself to setup communications with a fibre.

Is it macros and compile-time reflection?

I really like the functionality that macros enable, but at the same time, it’s also my least favourite part of Crystal to work with due to how unpolished it feels compared to everything else.

What if we require you to require all file dependencies up-front?

I don’t know, I really don’t have much of an opinion or care about how the require system works. I only have a tiny preference for requiring in each file because it makes it easier to find out where something comes from. But overall I am more than happy to roll with whatever when it comes to it.

without turning the language into Go, Rust, or another language.

Yes, please. I have not tried Rust but I do know that I don’t like programming in Go.

3 Likes

Being able to compile shared libraries in Crystal really easily would be a very nice feature to have.

4 Likes

Part 3 is up! Incremental compilation for Crystal - Part 3 - DEV Community 👩‍💻👨‍💻

Part 4 will likely not come out tomorrow as I’ll need time to code this :slight_smile:

10 Likes

I wouldn’t say “a no-go”, but rather, a difficult task.

2 Likes

I wasn’t this hooked up since Twin Peaks!

4 Likes

I haven’t seen Twin Peaks (though I heard it’s great) so I can’t comment much on that, but maybe this is more similar to Lost in that I’m writing things as I go. So be aware that anything can happen in the final episode :grin:

8 Likes