Several projects in the Crystal ecosystem work with the compiler’s AST from outside the compiler itself: the language server (crystalline), linters (ameba), documentation generators, and transpilers like railcar (GitHub - rubys/railcar: Rails-to-Crystal transpiler, Rails-compatible Crystal framework, and RBS type signature generator · GitHub) which translates Ruby to Crystal. Each of these independently discovers the same friction points and invents similar workarounds.
I’d like to start a discussion about what it would look like if Crystal treated programmatic AST construction and analysis as a supported use case. I’m not looking for near-term changes — just exploring whether this direction interests the community and what the right API surface might be.
For context, I recently filed crystal-lang/crystal#16822 proposing that semantic analysis be decoupled from LLVM. That removes the heaviest barrier to using the compiler as a library. The questions below are about what comes next — once tools can load the semantic phase, what API should they use?
Construction helpers
Building a simple x = 1 + 2 programmatically requires:
Crystal::Assign.new(
Crystal::Var.new("x"),
Crystal::Call.new(
Crystal::NumberLiteral.new("1").as(Crystal::ASTNode),
"+",
[Crystal::NumberLiteral.new("2").as(Crystal::ASTNode)]
)
)
The .as(Crystal::ASTNode) casts are needed because array literals infer a concrete element type that doesn’t match the Array(ASTNode) parameter. Every constructor call requires this ceremony. Railcar has 40 of these casts across 19 files. A builder API, factory methods that return ASTNode, or covariant array handling would eliminate this.
Serialization of abstract trees
Crystal’s ToSVisitor has overloads for every concrete node type but not the abstract ASTNode base. When the parser builds a tree, each node has a concrete compile-time type, so dispatch works. When external code builds trees, variables hold ASTNode references, and serialization fails.
The fix is a single overload:
class Crystal::ToSVisitor
def visit(node : Crystal::ASTNode)
node.accept(self)
false
end
end
The fix is small, but the principle it establishes matters: trees built at runtime should serialize the same as trees built by the parser.
Semantic analysis as a library
With the LLVM decoupling from #16822, semantic analysis can run standalone. But the API is awkward for tool use:
- You must use
Compilerwithno_codegen = trueto get the prelude loaded and requires resolved. There’s no lighter-weight entry point. - There’s no way to analyze a fragment (a single method or expression) without compiling a full program.
- Type information is attached to nodes via
.type?but there’s no documented query API — tools walk the AST and inspect nodes directly, hoping the layout hasn’t changed between Crystal versions. - Error reporting assumes compilation to a binary, not analysis for a tool.
A library-oriented API might look something like:
analyzer = Crystal::SemanticAnalyzer.new
analyzer.add_source("app.cr", source_code)
result = analyzer.analyze
result.type_of("x") # => Int32
result.method_return_type("MyClass", "foo") # => String
I’m not proposing this specific API — just illustrating what “designed for tool use” might look like versus the current “reach into compiler internals.”
Round-trip fidelity
Crystal.format(node.to_s) is currently the only way to get valid Crystal source from a programmatically constructed AST. This means: construct AST → serialize to string → re-parse to validate → format. If the string representation of a constructed node isn’t parseable (edge cases in interpolation, operator precedence, etc.), there’s no way to fix it without string manipulation. A direct AST → formatted source path would close this loop.
AST stability
Crystal’s AST node types change between releases — fields get added, renamed, or restructured. This is fine for an internal data structure but makes external tools fragile. Even minimal guarantees would help: a changelog for AST-breaking changes, or a versioned subset of node types that tools can depend on.
Who benefits
Every Crystal tool that does more than parsing hits some subset of these walls:
- crystalline (language server) — needs semantic analysis without codegen, needs to query types
- ameba (linter) — would benefit from type information on nodes
- Documentation generators — walk typed ASTs
- Code generators / macro libraries — construct ASTs programmatically
- Transpilers (railcar) — needs all of the above
Each project independently works around the same limitations. A supported API would consolidate those workarounds into one maintained surface.
Questions for discussion
- Is there interest in treating the AST and semantic phase as a library API?
- Which of these friction points do other tool authors also encounter?
- Are there concerns about stability commitments for compiler internals?
- Would an RFC be the right next step, or is this better addressed incrementally through individual issues?
I’m happy to contribute implementation work. The LLVM stub proof of concept (see stubs/llvm/src/llvm.cr in the railcar repo) demonstrates that the semantic phase already works standalone with minimal shimming — the question is whether Crystal wants to officially support that path.
Related prior discussion
A 2020 forum thread (“What would it take to be able to pass a stream of AST nodes to the compiler directly?”) explored a similar desire from the macro side. PR #8836 (user-defined macro methods operating on AST nodes) was prototyped but not merged. The approach here is complementary — rather than extending the macro system, it’s about making the existing AST and semantic infrastructure accessible to external tools.