A few days ago I started working on a compiler refactor, which is in this branch.
Right now the compiler uses String all over the place: a method, a call, an argument, etc., their names are strings. Variables tracked by the compiler also use strings. They are put in a Hash
for lookups. This lookup involves calling String#hash
and String#==
to compare keys. This is not very efficient!
Most, if not all, compilers, use a different type for “identifiers” like variable names, method names, etc. The idea is to use a type that’s more efficient for storing inside a Hash
. One idea is to use a wrapper around String where #hash
is the String’s object_id, and comparing these types is comparing the String’s object_id. For this to work we must guarantee that a same string, like “x”, is always backed by the same string instance. This can be done with a StringPool.
That’s the gist of the branch I mentioned before.
It’s a huge refactor because the compiler’s code is huge and full of string literals and string comparisons.
The branch can already produce a compiler that can run some specs, like array_spec.cr
or int_spec.cr
. Compiler specs don’t run yet and there’s a lot more to be done.
But…
When I use this new compiler, the performance turns to actually be slower. Not significantly slower, though. But I was expecting the performance to be at least equal than before, not slower. I still don’t understand why.
If someone wants to take a look at the refactor and maybe compiler the compiler in release mode and check the output of doing bin/crystal build -s spec/std/array_spec.cr
and comparing the times with that branch’s compiler and master to see if you get similar results than me, please do so! Or if someone can figure out why it’s slower than before, that would be huge!
I’m also sure the compiler spends a lot more time allocating memory than doing Hash lookups, so maybe that’s why the performance isn’t significantly improved. But it’s still puzzling that it’s slower…