Tree-sitter is a parsing library that was just brought into neovim 0.5, and I think some other editors use it too. It’s intended to be used by any editor, so like LSP servers, they can all share a common tool for a given language.
It is really cool since it does fast incremental parsing, so as you type it can reparse just the new part of the tree, and it handles errors (such as a partially typed keyword, or whatever). This talk from the author is good Tree-sitter - a new parsing system for programming tools - Strange Loop
By having your editor be able to actually parse the code instead of just doing some complicated regexes, you get much better and faster syntax highlighting. But more than that it starts to enable refactoring tools that deal with the actual AST.
I started trying to write one for crystal, but it’s a pretty daunting project, and I know I’ll never be able to finish it myself. If anyone is interested, I’ve put what little I’ve got so far (just symbols) GitHub - will/tree-sitter-crystal and I think something like this would work well rubinus-commit-bit style, where anyone who gets a PR in gets commit.
Hopefully eventually it can get in a state where it gets added to the http://github.com/tree-sitter org since that seems to be like the package manager of sorts for the grammars.
I am very interested in this same idea – I also use nvim 0.5, and I’d also love a tree-sitter parser for Crystal. I’m not especially good at tree-sitter (I’ve tried poking at keidax’s grammar linked above recently with no real success).
But if I’ve got time and capability, I’ll have to shoot a PR your way with whatever I can get working.
I didn’t read anything about this, nor seen the video/presentation, but… does this have to be implemented in C, and with all that grammar thing? Would it be possible to reuse Crystal’s lexer and parser for this, which are written in Crystal?
You could maybe get by with some sort of trick compiling the parser down into a static lib and including that, but it’d be hard.
However one of the main features is the recovering from errors, here you can see in ruby I put an error in the middle, but it was able to spot that error, and continue parsing
Ah, I see. Yeah, the parser can’t do that, but it could be modified with a flag to allow those scenarios. Though it’s strange… where does the error live? It looks like there’s a binary AST node, but it has an ERROR right after left and before right.
Be warned that implementing the lexer and parser for Crystal in C is a lot of work! The lexer has about 3200 lines of code, and the parser has like 6000 lines. And Crystal is pretty concise!
If I understand correctly writing scanners (in C or anything else) is not always required in tree-sitter Tree-sitter|Creating Parsers
Many languages have some tokens whose structure is impossible or inconvenient to describe with a regular expression.
Tree-sitter allows you to handle these kinds of tokens using external scanners . An external scanner is a set of C functions that you, the grammar author, can write by hand in order to add custom logic for recognizing certain tokens.
So the only required thing is specifying the language gramma in JavaScript DSL which is in Ruby case is less than 1000 lines.