Tree-sitter Crystal

Tree-sitter is a parsing library that was just brought into neovim 0.5, and I think some other editors use it too. It’s intended to be used by any editor, so like LSP servers, they can all share a common tool for a given language.

It is really cool since it does fast incremental parsing, so as you type it can reparse just the new part of the tree, and it handles errors (such as a partially typed keyword, or whatever). This talk from the author is good Tree-sitter - a new parsing system for programming tools - Strange Loop

By having your editor be able to actually parse the code instead of just doing some complicated regexes, you get much better and faster syntax highlighting. But more than that it starts to enable refactoring tools that deal with the actual AST.

I started trying to write one for crystal, but it’s a pretty daunting project, and I know I’ll never be able to finish it myself. If anyone is interested, I’ve put what little I’ve got so far (just symbols) GitHub - will/tree-sitter-crystal and I think something like this would work well rubinus-commit-bit style, where anyone who gets a PR in gets commit.

Hopefully eventually it can get in a state where it gets added to the http://github.com/tree-sitter org since that seems to be like the package manager of sorts for the grammars.

6 Likes

Amazing! There was another partial implementation I stumbled across recently which may be of interest: GitHub - keidax/tree-sitter-crystal.

1 Like

Oh that’s cool, I did search around of a bit before starting it, but didn’t find this one. Thanks!

I am very interested in this same idea – I also use nvim 0.5, and I’d also love a tree-sitter parser for Crystal. I’m not especially good at tree-sitter (I’ve tried poking at keidax’s grammar linked above recently with no real success).

But if I’ve got time and capability, I’ll have to shoot a PR your way with whatever I can get working.

I didn’t read anything about this, nor seen the video/presentation, but… does this have to be implemented in C, and with all that grammar thing? Would it be possible to reuse Crystal’s lexer and parser for this, which are written in Crystal?

You could maybe get by with some sort of trick compiling the parser down into a static lib and including that, but it’d be hard.

However one of the main features is the recovering from errors, here you can see in ruby I put an error in the middle, but it was able to spot that error, and continue parsing

I’m not sure if the Crystal parser can do that?

1 Like

Ah, I see. Yeah, the parser can’t do that, but it could be modified with a flag to allow those scenarios. Though it’s strange… where does the error live? It looks like there’s a binary AST node, but it has an ERROR right after left and before right.

Be warned that implementing the lexer and parser for Crystal in C is a lot of work! The lexer has about 3200 lines of code, and the parser has like 6000 lines. And Crystal is pretty concise!

1 Like

Yeah it’s not going to be trivial, but the ruby one seems to just be 1k lines in the js dsl tree-sitter-ruby/grammar.js at master · tree-sitter/tree-sitter-ruby · GitHub then another 1k of c++ scanner code tree-sitter-ruby/scanner.cc at master · tree-sitter/tree-sitter-ruby · GitHub , then some other things that seem generated to me.

If I understand correctly writing scanners (in C or anything else) is not always required in tree-sitter Tree-sitter|Creating Parsers

Many languages have some tokens whose structure is impossible or inconvenient to describe with a regular expression.

Tree-sitter allows you to handle these kinds of tokens using external scanners . An external scanner is a set of C functions that you, the grammar author, can write by hand in order to add custom logic for recognizing certain tokens.

So the only required thing is specifying the language gramma in JavaScript DSL which is in Ruby case is less than 1000 lines.

As being already mentioned here parser is actually generated from grammar described in three-sitter JS DSL.

Another cool thing with tree-sitter is there are already WASM bindings and once you’ve created parser with tree-sitter you can use it on the web tree-sitter/README.md at master · tree-sitter/tree-sitter · GitHub

Some meaningful(les) stats for parsers automatically generated by tree-sitter:

Fo those of you written a parser or 2 in your life ;) is it a lot lines of code than a hand written parser in C would have?

Source grammars (terse nested JSON with not a lot code per line)

I can recommend watching the talk. Last 10 minutes are the most interesting parts about error handling.