Parser combinators

I’ve been thinking about parser combinators lately…

There are some libraries out there but I have a feeling they are all a bit inefficient: they don’t use Char::Reader and they usually create a lot of intermediate arrays and objects. They also use some functional concepts that, eventually, in my taste, makes the code a bit hard to undertand.

I’ve been wanting to do a good and efficient parser combinator library, as a shard, a bit for fun and also because it could actually be useufl.

But… I’m curious…

  • Are you using one of the existing libraries for production purposes?
  • If not. why? Is it because you didn’t need them, or because you used them and they turned out to be slow or not fitting your exact purposes? Or because it was just easier to do a hand-written parser?

I’d only like to create a shard for this if it’s going to actually be useful. If it’s just for fun I can keep it to myself :-) (that way I don’t have to maintain it for others)

6 Likes

Just a note: me saying the existing libraries aren’t good enough for my taste isn’t necessary a bad thing. If these libraries exist just for fun then that’s totally fine. What I’d like to do is to create an efficient library like that, but only if someone is going to use it :smile:

I just paid money for this parser class last week. It’s all a little over my head but I think it’s slowly starting to make sense.

Now, if someone were to make a Shard that was efficient and ergonomic to use… that would be fine with me! :joy:

I don’t have any serious use cases. But I think it would be useful for one of my hobby projects.

As part of developing a competitive snake for Battlesnake I have ended up creating a kinda customised chess notation (snake notation) for writing tests and logging.
It looks like this:

🔥 🌶️ 🔥 🔥 🔥 🔥 🔥
🧡 🟧 🟧 d6 e6 f6 g6
a5 b5 🟧 🟠 🍎 f5 g5
a4 b4 c4 d4 e4 f4 g4
💜 b3 c3 🍎 e3 f3 g3
🟪 🟪 🟪 🟣 e2 f2 g2
a1 b1 c1 d1 e1 f1 g1

Currently, I am parsing it with a very large nested case statement that has a lot of long regexes.
examples:

when /\A(?:[a-z]+\d+)*(?:\+\+|🍎)\z/
when /\A(?:[a-z]+\d+)*(?:H|B|T)\d+\z/
when /\A(?:[a-z]+\d+)*(?:🟣|🟠|🔵|🟢|🟡|🟤|⚫|⚪|🟪|🟧|🟦|🟩|🟨|🟫|⬛|⬜|💜|🧡|💙|💚|💛|🤎|🖤|🤍)\z/

The entire thing is becoming more and more unwieldy as I add features, so I am thinking that I should move it to a parser combinator.

1 Like

I’m new to this whole parsers/languages thing, but I’m slowly learning.

Recently I’ve started building a parser using GitHub - jemc/crystal-pegmatite: A high-performance Parsing Expression Grammar (PEG) library for the Crystal language. this is used to build GitHub - savi-lang/savi: A fast language for programmers who are passionate about their craft. language (based on Pony language runtime) and I want to see how it’ll go for me. I’m in the very beginning, but I like it’s approach so far: DSL and instead of flat stream of tokens it can emit flat stream but with with parent/child relationship embedded into it. It uses Tuple for each token with {type, start, end}. Not sure how common this approach is but token itself uses minimal amount of memory. Then there is a helper class to navigate this stream as a tree-like structure.

The performance and memory use for simple parser is better than the one I’ve based on Regex, but it can become slower as I add more features I think. We’ll see.

Interestingly enough there is a TODO comment to use Char::Reader (probably for optimization too?) crystal-pegmatite/unicode_any.cr at 429e47c2b8d84fc26264d19f249c8d76213b5a4a · jemc/crystal-pegmatite · GitHub

Thank you!

I think I’ll eventually publish it as a shard, even if just for fun, because… well, it’s fun :-)

4 Likes

https://www.destroyallsoftware.com/screencasts/catalog/a-compiler-from-scratch

1 Like

Also have this class but going through interpreters class first by same author.

Offtopic: What an interesting initiative (destroyallsoftware.com)!

The name makes me laugh a lot, not on a negative sense, but because how clever it is. We can imagine the creativity process behind its authors, saying things like “how can we express the reason behind our courses”, haha. :joy:

Really interesting site, which I’m considering to subscribe :slight_smile:

1 Like

I would suggest to look at a Pratt parser. Pratt parser is more flexible, faster and easier to implement and to test, see Pratt Parsers: Expression Parsing Made Easy – journal.stuffwithstuff.com

2 Likes

I hand built a parser for Mint (not really a parser combinator per-se), which now and then thinking about moving to a separate shard. You can check if for inspiration.

I played around with tree sitter and while it’s easy to get started it’s fundamentally limited and you have to write C code of you want something custom.

For mint how did you go about settling on a design. In the end it translates to js correct? Do you have any documentation, or other work that inspired your design and implementation?

Yes, it translates to JavaScript.

Originally I started in Ruby with (parslet - About), but it turned out that I wanted to have better error messages and a static binary, so I started to write a parser by hand in Crystal with no prior research into parsers or compilers.

One weird thing about Mint parser is that it doesn’t have a tokenizer phase, parsing returns the AST directly, which is good and bad. It’s good because it gives you greater control, and it’s faster, and bad because it’s harder to implement some features like comments, for example.

1 Like