Parser combinators

asterite · January 28, 2022, 6:03pm

I’ve been thinking about parser combinators lately…

There are some libraries out there but I have a feeling they are all a bit inefficient: they don’t use Char::Reader and they usually create a lot of intermediate arrays and objects. They also use some functional concepts that, eventually, in my taste, makes the code a bit hard to undertand.

I’ve been wanting to do a good and efficient parser combinator library, as a shard, a bit for fun and also because it could actually be useufl.

But… I’m curious…

Are you using one of the existing libraries for production purposes?
If not. why? Is it because you didn’t need them, or because you used them and they turned out to be slow or not fitting your exact purposes? Or because it was just easier to do a hand-written parser?

I’d only like to create a shard for this if it’s going to actually be useful. If it’s just for fun I can keep it to myself :-) (that way I don’t have to maintain it for others)

asterite · January 28, 2022, 6:18pm

Just a note: me saying the existing libraries aren’t good enough for my taste isn’t necessary a bad thing. If these libraries exist just for fun then that’s totally fine. What I’d like to do is to create an efficient library like that, but only if someone is going to use it

speg · January 28, 2022, 7:29pm

I just paid money for this parser class last week. It’s all a little over my head but I think it’s slowly starting to make sense.

Now, if someone were to make a Shard that was efficient and ergonomic to use… that would be fine with me!

chillfox · January 29, 2022, 5:18am

I don’t have any serious use cases. But I think it would be useful for one of my hobby projects.

As part of developing a competitive snake for Battlesnake I have ended up creating a kinda customised chess notation (snake notation) for writing tests and logging.
It looks like this:

🔥 🌶️ 🔥 🔥 🔥 🔥 🔥
🧡 🟧 🟧 d6 e6 f6 g6
a5 b5 🟧 🟠 🍎 f5 g5
a4 b4 c4 d4 e4 f4 g4
💜 b3 c3 🍎 e3 f3 g3
🟪 🟪 🟪 🟣 e2 f2 g2
a1 b1 c1 d1 e1 f1 g1

Currently, I am parsing it with a very large nested case statement that has a lot of long regexes.
examples:

when /\A(?:[a-z]+\d+)*(?:\+\+|🍎)\z/
when /\A(?:[a-z]+\d+)*(?:H|B|T)\d+\z/
when /\A(?:[a-z]+\d+)*(?:🟣|🟠|🔵|🟢|🟡|🟤|⚫|⚪|🟪|🟧|🟦|🟩|🟨|🟫|⬛|⬜|💜|🧡|💙|💚|💛|🤎|🖤|🤍)\z/

The entire thing is becoming more and more unwieldy as I add features, so I am thinking that I should move it to a parser combinator.

vlazar · January 29, 2022, 6:13am

I’m new to this whole parsers/languages thing, but I’m slowly learning.

Recently I’ve started building a parser using GitHub - jemc/crystal-pegmatite: A high-performance Parsing Expression Grammar (PEG) library for the Crystal language. this is used to build GitHub - savi-lang/savi: A fast language for programmers who are passionate about their craft. language (based on Pony language runtime) and I want to see how it’ll go for me. I’m in the very beginning, but I like it’s approach so far: DSL and instead of flat stream of tokens it can emit flat stream but with with parent/child relationship embedded into it. It uses Tuple for each token with {type, start, end}. Not sure how common this approach is but token itself uses minimal amount of memory. Then there is a helper class to navigate this stream as a tree-like structure.

The performance and memory use for simple parser is better than the one I’ve based on Regex, but it can become slower as I add more features I think. We’ll see.

Interestingly enough there is a TODO comment to use Char::Reader (probably for optimization too?) crystal-pegmatite/unicode_any.cr at 429e47c2b8d84fc26264d19f249c8d76213b5a4a · jemc/crystal-pegmatite · GitHub

asterite · January 31, 2022, 11:48am

Thank you!

I think I’ll eventually publish it as a shard, even if just for fun, because… well, it’s fun :-)

cloutiy · January 31, 2022, 3:10pm

https://www.destroyallsoftware.com/screencasts/catalog/a-compiler-from-scratch

cloutiy · January 31, 2022, 3:11pm

Also have this class but going through interpreters class first by same author.

paulocoghi · February 4, 2022, 1:21pm

Offtopic: What an interesting initiative (destroyallsoftware.com)!

The name makes me laugh a lot, not on a negative sense, but because how clever it is. We can imagine the creativity process behind its authors, saying things like “how can we express the reason behind our courses”, haha.

Really interesting site, which I’m considering to subscribe

gummybears · February 5, 2022, 2:16am

I would suggest to look at a Pratt parser. Pratt parser is more flexible, faster and easier to implement and to test, see Pratt Parsers: Expression Parsing Made Easy – journal.stuffwithstuff.com

gdotdesign · February 6, 2022, 9:51am

I hand built a parser for Mint (not really a parser combinator per-se), which now and then thinking about moving to a separate shard. You can check if for inspiration.

I played around with tree sitter and while it’s easy to get started it’s fundamentally limited and you have to write C code of you want something custom.

cloutiy · February 7, 2022, 8:40pm

For mint how did you go about settling on a design. In the end it translates to js correct? Do you have any documentation, or other work that inspired your design and implementation?

gdotdesign · February 8, 2022, 6:12am

Yes, it translates to JavaScript.

Originally I started in Ruby with (parslet - About), but it turned out that I wanted to have better error messages and a static binary, so I started to write a parser by hand in Crystal with no prior research into parsers or compilers.

One weird thing about Mint parser is that it doesn’t have a tokenizer phase, parsing returns the AST directly, which is good and bad. It’s good because it gives you greater control, and it’s faster, and bad because it’s harder to implement some features like comments, for example.

Topic		Replies	Views
Parsem, (yet another) parser combinator library Community	2	561	August 21, 2022
[Video tutorial] Building an interactive DSL Learning Resources	2	888	July 10, 2020
Crystalline - LSP server for Crystal Community	40	4645	June 10, 2022
Prototype BlingFire's Crystal binding, a tokenizer for Crystal News	1	388	August 21, 2023
Greetings from a programming neophyte! Introductions	2	465	January 7, 2020

Parser combinators

Related topics