The weight of compiler tools

straight-shoota · May 30, 2024, 4:37pm

The Crystal compiler includes a number of different tools that are helpful for Crystal development.

Most of them are part of the compiler because they use compiler internals for code analysis or similar things. And most of them are fairly simple programs that more or less query some particular insights from the AST.
It makes a lot of sense to have them included in the compiler because they require the compiler source anyway, and they don’t add that much weight.

But some things do add weight.

For example, the business end of crystal doc is basically a domain-specific static site generator that produces HTML output.
crystal play is an HTTP webserver which starts a compilation process on API request.

These are not really core features of a program that’s intended to compile source code.

Merging all this together in a single program is a pile of complexity.

Every time we build the compiler, we’re building an HTML generator and a web server.
Especially for development builds, that’s pretty wasteful because we usually don’t care about the doc generator or the playground when working on compiler features.

With this patch to disable building some of the tools, we get a nice performance improvement (about 10% on my machine) when we only need a new development build of the compiler itself. It would certainly make a lot of sense to use that when working on the compiler.

There’s currently a discussion on a feature request for HTML sanitization for the doc generator. HTML parsing requires libxml2, so that would add another dependency to the compiler. And it’s a dependency that has nothing to do with compiling code.
As mentioned there I believe it would be a good idea to extract all parts that are not directly related to the core business of a compiler into their own programs.
I don’t think there’s a compelling reason to have an web server directly running in the compiler process, or an HTML generator (which might soon also do sanitization).
These could very well be their own programs which interact with the compiler program through a well-defined interface.
For crystal play I imagine this should be pretty straightforward by replacing compiler.compile with Process.run("crystal build"). Of course there’s a bit more involved, but the interface already exists and it’s pretty small.
Extracting crystal doc requires a bit more work to establish a clear structured data format for interchange (Improved Crystal API model mapping · Issue #6947 · crystal-lang/crystal · GitHub). But doable.

I merely think of this as an internal refactor. It should be pretty much invisible to the user. crystal play and crystal doc should continue to work as they did before. Just that the compiler delegates to a different program (which then in turn runs the compiler again when it needs it).
We might keep developing these tools in the main repo or gradually extract them into their own repos that can evolve independently (similar to shards).

Thanks to @ggiraldez for bringing up this topic in a private conversation.

nobodywasishere · May 30, 2024, 6:36pm

I think this is a good idea! I’m imagining the docs generator could be separated out into it’s own crystal-lang/crystal-doc repo that the main compiler repo has a submodule pointing to, where all crystal doc would have to do is call the crystal-doc binary, and there could be separate Makefile rules to build crystal-doc and put it in the right place. There could then be a crystal tool dump-info (or similar) that crystal-doc and other tools could build off of.

ysbaddaden · May 30, 2024, 8:21pm

I’ve been wishing (and probably voiced myself years ago) for the crystal binary to instead be a myriad of smaller binaries, so a huge from me.

luislavena · May 30, 2024, 9:43pm

I was going to on this too, and actually @ysbaddaden explored this part of his runic-lang experiment: compiler/Makefile at master · runic-lang/compiler · GitHub

Where the entry point for the compiler was runic and the different subcommands were separate executables, similar to how Git gets extended and operates.

PS: I’m known to stalk Julien’s repositories on GitHub, I promise is all with good intentions

zw963 · May 31, 2024, 11:43am

I use Crystal for two years, but, i never use play feature on the local, because that difinately not needed for me.

zw963 · May 31, 2024, 11:46am

But i use crystal doc a lot, when Crystal release new version, i build compiler from source code, and always generate doc locally, that can be browser quickly in my firefox even without network.

straight-shoota · May 31, 2024, 3:37pm

A post was split to a new topic: Object inspection helpers

jwoertink · May 31, 2024, 4:01pm

Definitely from me. It would be kind of neat if the compiler had some sort of pluggable CLI interface that could have external tools registered somehow. Sort of half-baked idea here, but I’m thinking like if someone in the community (maybe even ameba??) had some tool that could be “installed” and registered then say you have some interface that looks like

crystal register path/to/somehwere/maybe/github?
crystal register crystal-lang/crystal-doc
crystal doc

crystal register crystal-ameba/ameba
crystal ameba

Sure, it would be very complicated, and probably still require adding in some extra dependencies like an http client or whatever. Or possible just hooking in to a git clone… However, something along these lines could make for a pretty extensible ecosystem. I’m sure whatever y’all land on will be great though. Just wanted to add in some extra ideas.

bcardiff · May 31, 2024, 5:17pm

I think that a first step would be to treat the crystal-lang/crystal repo as a monorepo of the compiler and tools.

Later we can extract the tool if needed to a separate repo. Yet I think there are pros to keep them in the same repo. But we can still open the game to have external tools register in the crystal cli.

Xen · June 2, 2024, 8:27pm

Sounds like a good idea.

I was going to suggest the git model. Docker has done it too. As long as crystal -h lists the external subcommands for discoverability.

hugopl · June 3, 2024, 8:26pm

I was relief when I read this part, but yes, transparently split this in small programs seems like a good idea.