Programmatic AST construction as a supported use case

rubys · April 10, 2026, 2:47pm

Several projects in the Crystal ecosystem work with the compiler’s AST from outside the compiler itself: the language server (crystalline), linters (ameba), documentation generators, and transpilers like railcar (GitHub - rubys/railcar: Rails-to-Crystal transpiler, Rails-compatible Crystal framework, and RBS type signature generator · GitHub) which translates Ruby to Crystal. Each of these independently discovers the same friction points and invents similar workarounds.

I’d like to start a discussion about what it would look like if Crystal treated programmatic AST construction and analysis as a supported use case. I’m not looking for near-term changes — just exploring whether this direction interests the community and what the right API surface might be.

For context, I recently filed crystal-lang/crystal#16822 proposing that semantic analysis be decoupled from LLVM. That removes the heaviest barrier to using the compiler as a library. The questions below are about what comes next — once tools can load the semantic phase, what API should they use?

Construction helpers

Building a simple x = 1 + 2 programmatically requires:

Crystal::Assign.new(
  Crystal::Var.new("x"),
  Crystal::Call.new(
    Crystal::NumberLiteral.new("1").as(Crystal::ASTNode),
    "+",
    [Crystal::NumberLiteral.new("2").as(Crystal::ASTNode)]
  )
)

The .as(Crystal::ASTNode) casts are needed because array literals infer a concrete element type that doesn’t match the Array(ASTNode) parameter. Every constructor call requires this ceremony. Railcar has 40 of these casts across 19 files. A builder API, factory methods that return ASTNode, or covariant array handling would eliminate this.

Serialization of abstract trees

Crystal’s ToSVisitor has overloads for every concrete node type but not the abstract ASTNode base. When the parser builds a tree, each node has a concrete compile-time type, so dispatch works. When external code builds trees, variables hold ASTNode references, and serialization fails.

The fix is a single overload:

class Crystal::ToSVisitor
  def visit(node : Crystal::ASTNode)
    node.accept(self)
    false
  end
end

The fix is small, but the principle it establishes matters: trees built at runtime should serialize the same as trees built by the parser.

Semantic analysis as a library

With the LLVM decoupling from #16822, semantic analysis can run standalone. But the API is awkward for tool use:

You must use Compiler with no_codegen = true to get the prelude loaded and requires resolved. There’s no lighter-weight entry point.
There’s no way to analyze a fragment (a single method or expression) without compiling a full program.
Type information is attached to nodes via .type? but there’s no documented query API — tools walk the AST and inspect nodes directly, hoping the layout hasn’t changed between Crystal versions.
Error reporting assumes compilation to a binary, not analysis for a tool.

A library-oriented API might look something like:

analyzer = Crystal::SemanticAnalyzer.new
analyzer.add_source("app.cr", source_code)
result = analyzer.analyze

result.type_of("x")  # => Int32
result.method_return_type("MyClass", "foo")  # => String

I’m not proposing this specific API — just illustrating what “designed for tool use” might look like versus the current “reach into compiler internals.”

Round-trip fidelity

Crystal.format(node.to_s) is currently the only way to get valid Crystal source from a programmatically constructed AST. This means: construct AST → serialize to string → re-parse to validate → format. If the string representation of a constructed node isn’t parseable (edge cases in interpolation, operator precedence, etc.), there’s no way to fix it without string manipulation. A direct AST → formatted source path would close this loop.

AST stability

Crystal’s AST node types change between releases — fields get added, renamed, or restructured. This is fine for an internal data structure but makes external tools fragile. Even minimal guarantees would help: a changelog for AST-breaking changes, or a versioned subset of node types that tools can depend on.

Who benefits

Every Crystal tool that does more than parsing hits some subset of these walls:

crystalline (language server) — needs semantic analysis without codegen, needs to query types
ameba (linter) — would benefit from type information on nodes
Documentation generators — walk typed ASTs
Code generators / macro libraries — construct ASTs programmatically
Transpilers (railcar) — needs all of the above

Each project independently works around the same limitations. A supported API would consolidate those workarounds into one maintained surface.

Questions for discussion

Is there interest in treating the AST and semantic phase as a library API?
Which of these friction points do other tool authors also encounter?
Are there concerns about stability commitments for compiler internals?
Would an RFC be the right next step, or is this better addressed incrementally through individual issues?

I’m happy to contribute implementation work. The LLVM stub proof of concept (see stubs/llvm/src/llvm.cr in the railcar repo) demonstrates that the semantic phase already works standalone with minimal shimming — the question is whether Crystal wants to officially support that path.

Related prior discussion

A 2020 forum thread (“What would it take to be able to pass a stream of AST nodes to the compiler directly?”) explored a similar desire from the macro side. PR #8836 (user-defined macro methods operating on AST nodes) was prototyped but not merged. The approach here is complementary — rather than extending the macro system, it’s about making the existing AST and semantic infrastructure accessible to external tools.

straight-shoota · April 10, 2026, 4:48pm

This is a very valuable discussion. Thanks for bringing it up!

I believe the primary issue is that the compiler source code has traditionally been considered an implementation detail only relevant for compiler developers.
There are no proper API docs, no changelog entries about API changes, etc.

I don’t think we can make the same stability guarantees as with the language and stdlib.
Especially breaking changes are relatively common and necessary because the compiler is a complex piece of organically grown software and requires frequent refactoring. I don’t think we can avoid that. But that’s probably fine, as long as we properly document the changes so that consumers of the compiler API can keep up.

These topics are not just relevant for external tooling using the compiler, but also the compiler itself (and compiler tools).

For example, the compiler’s spec suite is bloated with thousands of ASTNode type annotations. It would be really nice to have a more convenient API.
A possible language-side solution for that would be array autocasting (ref Autocasting of array literals · Issue #10188 · crystal-lang/crystal · GitHub).
But maybe we should just add overloads that implicitly cast arrays to Array(ASTNode), even if that means additional allocations?

It sounds very reasonable that ASTNode#to_s should produce well-formatted code.
Most of the time it already does.

Related issue in ameba: Expand ameba's functionality with semantic information · Issue #513 · crystal-ameba/ameba · GitHub

rubys · April 10, 2026, 5:45pm

Fair enough on the pushback on the request for a stable AST.

What I’m exploring has a hard dependency on the AST, and I accept that’s on me. The reality is that Crystal is mature and with maturity brings a natural inertia. I can also mitigate that by monitoring and participating.

I’m willing to contribute, and Move CallConvention and Const properties out of LLVM/codegen by rubys · Pull Request #16823 · crystal-lang/crystal · GitHub is an example. Let me know how best to proceed.

bcardiff · April 10, 2026, 6:40pm

Having a builder api seems reasonable. It can start as a shard for faster iteration. I think in specs we only have convenient builders for some LLVM IR like LLVMBuilderHelper.

In Ruby to Rails transpiler I see the benefit of such builder. To avoid string manipulation.
It’s definitely useful if such AST is pretty printed.

The round-trip fidelity has been an issue for crystal playground and having an easy way to prompt the user to submit an issue has allowed us to iterate a lot in the past. I think we are stable now, but there is no guarantee that new syntax constructs are handled accordingly.

For the more internal features like semantic analysis is harder to draw a line as things are right now. So I would focus on ast manipulation first.

Having a builder api is a convenient way to offer stability beyond how the types are actually named and constructed.

I am not sure that serialization of AST nodes needs to be pretty printed. Sometimes you want to avoid ambiguity when viewing a value: a + b + c is a + (b + c) or (a + b) + c.

At some point I used GitHub - bcardiff/crystal-ast-helper: Helper tool to debug parser and formatter · GitHub to dive deeper on the formatter and parser. I encourage to build whatever tool helps you to iterate more efficiently.

kojix2 · April 11, 2026, 11:23am

astv.cr is a static site that parses Crystal snippets and visualizes the AST tree. It is built with WebAssembly and runs in the browser rather than on a server. There may still be some rough edges here and there, but it is reasonably usable.

A large part of this tool is also the product of vibe coding, so I have not really introduced it on the forum before.

What I really wanted to do was add the semantic phase as well, so that it could visualize how the tree changes as various visitors and passes run over it. However, as long as the compiler is tightly coupled to LLVM, I felt it would be hard to run that in the browser.

bcardiff · April 11, 2026, 11:39am

Wow! It looks very neat!

rubys · April 11, 2026, 12:17pm

However, as long as the compiler is tightly coupled to LLVM, I felt it would be hard to run that in the browser.

Would it be helpful if I were to extract my llvm stub into a separate shard? Or were you looking for the full compiler? I can run semantic analysis with this stub.

I would post a link, but I’m not allowed to, I guess I’m too new to the community; in any case it is on github, rubys, railcar, stubs/llvm

Note: the stub may need updates when Crystal’s stdlib LLVM bindings change across versions.

paulocoghi · April 13, 2026, 5:56am

@straight-shoota is it possible to lift these limitations to Sam’s user @rubys?

bcardiff · April 13, 2026, 7:50am

I lifted the trust level. I think the restriction was due to being a new user.

straight-shoota · April 15, 2026, 9:07pm

It wasn’t intended as pushback. I believe the AST itself is actually relatively stable at this point.
Other parts of the compiler may have more movement.

Every minor release must be expected to have breaking changes in the compiler API.

straight-shoota · May 4, 2026, 8:25pm

We have an upcoming change in the lexer API: Standardize parsing of symbol and string array literals by straight-shoota · Pull Request #16748 · crystal-lang/crystal · GitHub

Calls to Lexer#next_string_array_token need to be replaced with Lexer#next_string_token. The two have been merged into a single method with standardized behaviour.

I don’t think it’s possible to keep this backward compatible. In some use cases, delegating next_string_array_token to next_string_token might be fine, but there are semantic differences (next_string_token returns SPACE tokens, for example). So we cannot keep the old method as a deprecated alias.

We can use this example do discuss how we should document this change.

In the changelog, the primary focus is the effect for the parser, not the internal API change. (this effect is tagged as breaking because it affects escape behaviour, which was previously broken for some percent literals and is now working correctly)

Should we add a new section about Compiler API to the release notes? I suppose this might not be very interesting for most readers, though.

Sija · May 4, 2026, 8:52pm

That would be helpful for the tooling ecosystem, at the very least, by listing the breaking changes.
And since these changes are very rare, it shouldn’t really bother anyone.

Topic		Replies	Views
Exploring the Compiler Tooling	26	1462	March 8, 2025
I would like to use the built in macro(AST) to create a representation for crystal code Help & Support	4	380	January 9, 2023
What would it take to be able to pass a stream of AST nodes to the compiler directly? Crystal Contrib	3	792	August 14, 2020
The weight of compiler tools Crystal Contrib rfc	9	601	June 3, 2024
Language Server Plans Community	8	1363	November 1, 2023