Ongoing experiment: ident pool

A few days ago I started working on a compiler refactor, which is in this branch.

Right now the compiler uses String all over the place: a method, a call, an argument, etc., their names are strings. Variables tracked by the compiler also use strings. They are put in a Hash for lookups. This lookup involves calling String#hash and String#== to compare keys. This is not very efficient!

Most, if not all, compilers, use a different type for “identifiers” like variable names, method names, etc. The idea is to use a type that’s more efficient for storing inside a Hash. One idea is to use a wrapper around String where #hash is the String’s object_id, and comparing these types is comparing the String’s object_id. For this to work we must guarantee that a same string, like “x”, is always backed by the same string instance. This can be done with a StringPool.

That’s the gist of the branch I mentioned before.

It’s a huge refactor because the compiler’s code is huge and full of string literals and string comparisons.

The branch can already produce a compiler that can run some specs, like array_spec.cr or int_spec.cr. Compiler specs don’t run yet and there’s a lot more to be done.

But…

When I use this new compiler, the performance turns to actually be slower. Not significantly slower, though. But I was expecting the performance to be at least equal than before, not slower. I still don’t understand why.

If someone wants to take a look at the refactor and maybe compiler the compiler in release mode and check the output of doing bin/crystal build -s spec/std/array_spec.cr and comparing the times with that branch’s compiler and master to see if you get similar results than me, please do so! Or if someone can figure out why it’s slower than before, that would be huge!

I’m also sure the compiler spends a lot more time allocating memory than doing Hash lookups, so maybe that’s why the performance isn’t significantly improved. But it’s still puzzling that it’s slower… :thinking:

9 Likes

Compiled the new compiler and I’m getting the following error when running the spec

➜  crystal-1 git:(compiler/ident) time bin/crystal build -s spec/std/array_spec.cr
Using compiled compiler at .build/crystal
Parse:                             00:00:00.003804375 (   3.55MB)
Semantic (top level):              00:00:00.103378875 (  64.05MB)
Semantic (new):                    00:00:00.000620750 (  64.05MB)
Semantic (type declarations):      00:00:00.010865917 (  80.05MB)
Semantic (abstract def check):     00:00:00.003592625 (  80.05MB)
Semantic (restrictions augmenter): 00:00:00.003443375 (  80.05MB)
Showing last frame. Use --error-trace for full trace.

In src/string.cr:217:31

 217 | raise ArgumentError.new("Cannot create a string with a null pointer and a non-zero (#{bytesize}) bytesize")
                               ^------------
Error: no overload matches 'String.interpolation' with types String, UInt64, String

Overloads are:
 - String.interpolation(value : String, char : Char)
 - String.interpolation(char : Char, value : String)
 - String.interpolation(value : String)
 - String.interpolation(value)
 - String.interpolation(*values : String)
 - String.interpolation(*values : *T)
bin/crystal build -s spec/std/array_spec.cr  0.21s user 0.08s system 120% cpu 0.238 total

Oops, thanks! I forgot to push the latest changes. Could you try now?

Just ran the spec successfully and you’re right your compiler is just a bit slower, 1.264 vs 1.209 ran on an M1

master

bin/crystal build -s spec/std/array_spec.cr  1.48s user 0.25s system 142% cpu 1.209 total

compiler/ident

bin/crystal build -s spec/std/array_spec.cr  1.55s user 0.26s system 143% cpu 1.264 total
1 Like

Going from a String to a Crystal::Ident still involves hashing inside StringPool, doesn’t it?

Correct!

But the way things are, the Idents are created early by the parser. There’s a bit more Ident creation during literal expansion and normalization. And there are a few more entries during semantic. But the idea is that once the pool of strings is created, putting these Idents in a Hash, looking them up or comparing them doesn’t involve a hash anymore. Unless I’m doing something wrong (I probably am!)

Put another way: every lookup on String involves a Hash. But with a pool it involves a hash once, but never again.

Yeah, but even in the worst case that every string occurs only once, I think there really shouldn’t be too much of an overhead. So something’s probably going wrong here.
We could do some performance measurements to see where the time is spent (especially in comparison to before/after the patch).

Btw. I think the title of this thread is a bit pessimistic… shouldn’t it be called an ongoing experiment? ;)

1 Like

Throwing a profiler at it I get
Screenshot from 2022-07-20 22-16-55

This is to be compared to the same when running unodified master:
Screenshot from 2022-07-20 22-17-13

There is a couple of things to note there (holy GC batman!), but the thing that really stands out to me is that in the remade variant Int#>> is suddenly taking more time. But it is already quite unreasonable that it takes that big of a share of total time in master. Could it be Redundant code is generated when shifting integer types by constant · Issue #11845 · crystal-lang/crystal · GitHub that is the issue?

Other than that I’m skeptical this approach with a hash based string pool will gain big improvements as it will still perform the hashes on every lookup. I’ve seen people try to use string pools several times, but never has it resulted in faster code.

A variant that may perform better could be to instead store the strings in a string array, but keep a hash table for the cases when insert happens. Then lookup would be just a pointer reference away and equality comparisons would literally be a integer comparison of the integer indexes, which also is what would be passed around.

EDIT: It might need a bigger perf test than the array_spec run that sdogroyul showed to get more representative data. Any tests should probably also not include codegen, as that shouldn’t be affected by this change and just introduce noise into the tracing measurements. I won’t be able to test until tuesday at the earliest though.

EDIT2: This is nonsense. See later followup message.

Right! But the hash of Ident is just the hash of an integer. In fact I also tried making the hash method of Ident just be the string’s object_id and I got the same results. Computing the hash of a String involves doing a hash of all its bytes, which I’m sure is more expensive. But maybe it’s not that much expensive? :thinking:

Could it be Redundant code is generated when shifting integer types by constant · Issue #11845 · crystal-lang/crystal · GitHub that is the issue?

I just tried replacing everything in hasher with unsafe_shr and unsafe_shl and doing a benchmark of #hash, and nothing changed. So maybe even though it generates worse code, the CPU executes it equally fine because of branch prediction (no idea)

In any case, I’m now thinking that the main thing to optimize in the compiler is reducing memory allocations. AST nodes are huge (the base is 72 bytes, and a Call is 216 bytes (!!)) and when compiling the compiler there are like 600K calls. That’s like 123MB of memory allocated just for calls. But there are many other AST node types. And none of those are ever freed.

I’m not sure how that can be optimized, though. It will require some deep thinking. But there’s no pressure to do it as the compile times aren’t terribly bad either, at least for medium-sized projects.

On the topic of memory reduction, did you see Andrew Kelley - Practical DOD on Vimeo (which was posted in #beginners in discord some week ago or so)? It may give some inspiration for that kind of work and goes through a bunch of techniques for doing that. First half is general talk about how allocations work that probably is old hat for you, but in the second part he goes on to use that on the Zig compiler.

I don’t know how easy it is to apply the ideas in it though.

On the topic of memory reduction, did you see Andrew Kelley - Practical DOD on Vimeo (which was posted in #beginners in discord some week ago or so)?

I did! It’s a good talk. But I’m afraid that it might make the code too cryptic if we go with such optimizations. That said, the compiler’s code is written in a very naive way regarding optimizations. Well, it’s not like we didn’t care at all about it, but there’s still a lot to do. Optimizing memory allocations will be a huge part of it.

Regarding that specific talk, I’m not so sure that optimizing memory is all you have to do. Andrew says that he optimized the size of tokens because there are a lot of those! But for example Crystal only has one token at a time to parse things, and maybe a couple more cached as look aheads. So it doesn’t matter how big a token is because there’s only a few of them in memory at the same time. So he might want to also look at other things to optimize (like “free lists”). I actually learned a lot about such optimizations by studying the D compiler’s source code.

2 Likes

Regarding profiling the compiler…

Right now, at least for me, it’s very tricky. Here’s a call graph:

The main issue is that this is how the compiler works:

  • It sees a call so it tries to type it
  • To type it, it looks for target methods
  • Once it finds the methods, it tried to type them
  • If the method has a call, it tries to type it
  • And so on…

That means that the call graph is extremely deep. Like, I can continue expanding that graph and it never ends.

Another big issue with that is that if your program has a call deep enough, then the compiler will stack overflow. That’s not good! That said, it’s hard that it happens in practice because it might also mean that you’ll get a stack overflow at runtime.

I quickly tried to move things to a work queue so that the call graph is flattened… but that doesn’t work because in some cases the compiler will eagerly verify types, assuming calls are typed right away.

Does someone know of profiling tools where you could see where most time is spent? Or where most memory allocations happen?

Linux only