Programmatic AST construction as a supported use case

Several projects in the Crystal ecosystem work with the compiler’s AST from outside the compiler itself: the language server (crystalline), linters (ameba), documentation generators, and transpilers like railcar (GitHub - rubys/railcar: Rails-to-Crystal transpiler, Rails-compatible Crystal framework, and RBS type signature generator · GitHub) which translates Ruby to Crystal. Each of these independently discovers the same friction points and invents similar workarounds.

I’d like to start a discussion about what it would look like if Crystal treated programmatic AST construction and analysis as a supported use case. I’m not looking for near-term changes — just exploring whether this direction interests the community and what the right API surface might be.

For context, I recently filed crystal-lang/crystal#16822 proposing that semantic analysis be decoupled from LLVM. That removes the heaviest barrier to using the compiler as a library. The questions below are about what comes next — once tools can load the semantic phase, what API should they use?

Construction helpers

Building a simple x = 1 + 2 programmatically requires:

Crystal::Assign.new(
  Crystal::Var.new("x"),
  Crystal::Call.new(
    Crystal::NumberLiteral.new("1").as(Crystal::ASTNode),
    "+",
    [Crystal::NumberLiteral.new("2").as(Crystal::ASTNode)]
  )
)

The .as(Crystal::ASTNode) casts are needed because array literals infer a concrete element type that doesn’t match the Array(ASTNode) parameter. Every constructor call requires this ceremony. Railcar has 40 of these casts across 19 files. A builder API, factory methods that return ASTNode, or covariant array handling would eliminate this.

Serialization of abstract trees

Crystal’s ToSVisitor has overloads for every concrete node type but not the abstract ASTNode base. When the parser builds a tree, each node has a concrete compile-time type, so dispatch works. When external code builds trees, variables hold ASTNode references, and serialization fails.

The fix is a single overload:

class Crystal::ToSVisitor
  def visit(node : Crystal::ASTNode)
    node.accept(self)
    false
  end
end

The fix is small, but the principle it establishes matters: trees built at runtime should serialize the same as trees built by the parser.

Semantic analysis as a library

With the LLVM decoupling from #16822, semantic analysis can run standalone. But the API is awkward for tool use:

  • You must use Compiler with no_codegen = true to get the prelude loaded and requires resolved. There’s no lighter-weight entry point.
  • There’s no way to analyze a fragment (a single method or expression) without compiling a full program.
  • Type information is attached to nodes via .type? but there’s no documented query API — tools walk the AST and inspect nodes directly, hoping the layout hasn’t changed between Crystal versions.
  • Error reporting assumes compilation to a binary, not analysis for a tool.

A library-oriented API might look something like:

analyzer = Crystal::SemanticAnalyzer.new
analyzer.add_source("app.cr", source_code)
result = analyzer.analyze

result.type_of("x")  # => Int32
result.method_return_type("MyClass", "foo")  # => String

I’m not proposing this specific API — just illustrating what “designed for tool use” might look like versus the current “reach into compiler internals.”

Round-trip fidelity

Crystal.format(node.to_s) is currently the only way to get valid Crystal source from a programmatically constructed AST. This means: construct AST → serialize to string → re-parse to validate → format. If the string representation of a constructed node isn’t parseable (edge cases in interpolation, operator precedence, etc.), there’s no way to fix it without string manipulation. A direct AST → formatted source path would close this loop.

AST stability

Crystal’s AST node types change between releases — fields get added, renamed, or restructured. This is fine for an internal data structure but makes external tools fragile. Even minimal guarantees would help: a changelog for AST-breaking changes, or a versioned subset of node types that tools can depend on.

Who benefits

Every Crystal tool that does more than parsing hits some subset of these walls:

  • crystalline (language server) — needs semantic analysis without codegen, needs to query types
  • ameba (linter) — would benefit from type information on nodes
  • Documentation generators — walk typed ASTs
  • Code generators / macro libraries — construct ASTs programmatically
  • Transpilers (railcar) — needs all of the above

Each project independently works around the same limitations. A supported API would consolidate those workarounds into one maintained surface.

Questions for discussion

  • Is there interest in treating the AST and semantic phase as a library API?
  • Which of these friction points do other tool authors also encounter?
  • Are there concerns about stability commitments for compiler internals?
  • Would an RFC be the right next step, or is this better addressed incrementally through individual issues?

I’m happy to contribute implementation work. The LLVM stub proof of concept (see stubs/llvm/src/llvm.cr in the railcar repo) demonstrates that the semantic phase already works standalone with minimal shimming — the question is whether Crystal wants to officially support that path.

Related prior discussion

A 2020 forum thread (“What would it take to be able to pass a stream of AST nodes to the compiler directly?”) explored a similar desire from the macro side. PR #8836 (user-defined macro methods operating on AST nodes) was prototyped but not merged. The approach here is complementary — rather than extending the macro system, it’s about making the existing AST and semantic infrastructure accessible to external tools.

3 Likes

This is a very valuable discussion. Thanks for bringing it up!

I believe the primary issue is that the compiler source code has traditionally been considered an implementation detail only relevant for compiler developers.
There are no proper API docs, no changelog entries about API changes, etc.

I don’t think we can make the same stability guarantees as with the language and stdlib.
Especially breaking changes are relatively common and necessary because the compiler is a complex piece of organically grown software and requires frequent refactoring. I don’t think we can avoid that. But that’s probably fine, as long as we properly document the changes so that consumers of the compiler API can keep up.

These topics are not just relevant for external tooling using the compiler, but also the compiler itself (and compiler tools).

For example, the compiler’s spec suite is bloated with thousands of ASTNode type annotations. It would be really nice to have a more convenient API.
A possible language-side solution for that would be array autocasting (ref Autocasting of array literals · Issue #10188 · crystal-lang/crystal · GitHub).
But maybe we should just add overloads that implicitly cast arrays to Array(ASTNode), even if that means additional allocations?

It sounds very reasonable that ASTNode#to_s should produce well-formatted code. :+1:
Most of the time it already does.


Related issue in ameba: Expand ameba's functionality with semantic information · Issue #513 · crystal-ameba/ameba · GitHub

2 Likes

Fair enough on the pushback on the request for a stable AST.

What I’m exploring has a hard dependency on the AST, and I accept that’s on me. The reality is that Crystal is mature and with maturity brings a natural inertia. I can also mitigate that by monitoring and participating.

I’m willing to contribute, and Move CallConvention and Const properties out of LLVM/codegen by rubys · Pull Request #16823 · crystal-lang/crystal · GitHub is an example. Let me know how best to proceed.

3 Likes

Having a builder api seems reasonable. It can start as a shard for faster iteration. I think in specs we only have convenient builders for some LLVM IR like LLVMBuilderHelper.

In Ruby to Rails transpiler I see the benefit of such builder. To avoid string manipulation.
It’s definitely useful if such AST is pretty printed.

The round-trip fidelity has been an issue for crystal playground and having an easy way to prompt the user to submit an issue has allowed us to iterate a lot in the past. I think we are stable now, but there is no guarantee that new syntax constructs are handled accordingly.

For the more internal features like semantic analysis is harder to draw a line as things are right now. So I would focus on ast manipulation first.

Having a builder api is a convenient way to offer stability beyond how the types are actually named and constructed.

I am not sure that serialization of AST nodes needs to be pretty printed. Sometimes you want to avoid ambiguity when viewing a value: a + b + c is a + (b + c) or (a + b) + c.

At some point I used GitHub - bcardiff/crystal-ast-helper: Helper tool to debug parser and formatter · GitHub to dive deeper on the formatter and parser. I encourage to build whatever tool helps you to iterate more efficiently.