String is a Class, why?

This is just out of curiosity, and came up when a friend asked why Strings are a Class in Crystal instead of a Struct.

Unless I’m missing something, the String class is essentially a wrapper around a Pointer(UInt8) in memory, where the first 12 bytes is a header (a type ID, size in bytes, and size in code points). The #to_unsafe method returns a pointer to the data that exists after this small header. Is this all correct? If so, why isn’t String a struct since the underlying data is just a pointer?

I did some searching both here and on Google and didn’t find anything that quite answered my question, though I did find this older post that shed a bit of light onto things. Same thing, “do things the simple and dumb way” to avoid GC issues with certain cases?

Nope, String does not have a pointer. The string data is actually directly embedded.

The memory layout of a string looks like this:

slice ="foo".unsafe_as(Pointer(UInt8)), 16) 
slice # => Bytes[1, 0, 0, 0, 3, 0, 0, 0, 3, 0, 0, 0, 102, 111, 111, 0]
      #          ^^TYPE_ID^  ^bytesize^  ^charsize^  'f'  'o'  'o'  \0

Every reference type has the TYPE_ID as first field. Then the bytesize and charsize fields are also pretty standard. But they’re immediately followed by the payload, the string content. This is not an additional pointer to somewhere else.


However, if you inspect the instance variables of a string, for example using tool hierarchy, you’ll see that the last member of a string is a char, not a pointer. That’s the first char of the string, then a pointer to that would give you the entire data (which is how to_unsafe is implemented)

It’s a little trick. I think C# does the same thing, not sure.

1 Like

Gotcha, thank you! I must have misread the code earlier. I see how it’s being implemented now.

So is it a class instead of a struct for reasons similar to what was mentioned in the other thread I linked? Reading a huge file into a string onto the stack sounds like it would be inefficient to pass around.