Why aren't strings pooled?

Strings are immutable, but from the code it doesn’t look like they are pooled by default (there’s StringPool). Why? I would guess that the improvement in cache locality would offset any speed hit. It looks like the compiler tries to pool literals.

a = "foo"
b = "fox".sub("x", "o")
p pointerof(a).value.as(Pointer(UInt8))
p pointerof(b).value.as(Pointer(UInt8))
p a == b

gets:

"Pointer(String)@0x7ffdacedce00"
"Pointer(String)@0x7ffdacedcdf8"
true
2 Likes

Answering myself: I guess there is a complication for GC. I’d have to think of how to work around that.

Is there any language that does this?

@asterite A lot of interpreters do. Certainly older and embedded ones. It helped with the memory-constrained systems they ran upon.

I figured out that I could monkey-patch String to pool, just by changing constructors, but I haven’t looked at how difficult it would be to teach the Boehm collector to consider the pool-internal references as weak, and to remove strings from the pool.

1 Like

Yeah, GC is the main issue.

In fact, Java used to reuse string substrings by having the substring point to the larger string. The problem is, sometimes you read a huge file into memory, kept a substring and didn’t use the larger string anymore. But the larger string was still there and couldn’t be freed.

In Crystal we just do things the simple and dumb way and we avoid those problems, but we also lose some performance opportunities (for example in Ruby Array and String are copy-on-write and the GC knows about this, I think).

2 Likes

@asterite As far as I can tell, many of the “simple and dumb” decisions are in the stdlib rather than the compiler, and thus available for me to (literally) monkey with, and I don’t have to understand the compiler internals. So, this is on my queue to experiment with.

2 Likes