String.new with Pointer(Char)

I have been doing a few C extensions and have noticed some strings I have to use as(Pointer(UInt8). This is not too annoying and can extracted to method and stuff. But I was wondering why. This seems to be like an alias and I dont know why this is a problem. Seems like some of the C extensions in the lib return Char* and work. Also looking into submitting a patch it looks like it would change some fundamental APIs. Would it be of benefit to more String to use Char instead of UInt8? It seems like it is the fundamental unit that String is based on.

example: Carcin

p = Pointer(Char).new(3)
p[0] = 'a'
p[1] = '\0'
p[2] = 'b'
s = String.new(p, 3)

I’m not sure I’m reading your comment correctly, but it seems you’re confusing Crystal’s Char type with C’s char. They are called the same, but are fundamentally different types. The latter represents an 8 bit character. It is mostly equivalent to Crystal’s UInt8 (alias LibC::Char = UInt8). Char is 4 byte wide and represents a Unicode codepoint in UTF-32 encoding.
While a string consists of characters, String uses UTF-8 encoding for space efficiency. That means 8 bit per character, which means the data format is compatible to a char pointer in C.

Bottom line: The Char type is equivalent to C’s unsigned long. I don’t think there are any C APIs that use that to represent strings. So you should never need Pointer(Char) for library bindings.

Yep, that’s already String#to_unsafe. It applies implicitly when passing String to a lib function.

1 Like

That makes a lot of sense. I was mistaken and I think the compiler was saving me from myself. I will have to look into my C lib to make sure I am doing everything correctly.

The C/C++ equivalent of Crystal Char is wchar_t.

A wchar_t is not guaranteed to hold a Unicode code point, only a code unit of some encoding, which happens to be UTF-16 on Windows. Thus char32_t is probably closer. (This distinction is important because Crystal’s Win32 bindings use LibC::WCHAR everywhere.)

3 Likes

exactly, on most places (am I just thinking about Linux?) it will be 32bits, but it maybe not… One thing that I really dislike in the C and C++ standards is that they do not define anything exactly, there’s too much of depends on the platform, I understand that the platform spectrum for C language is really huge, but even so…

I think the Crystal idea to name the integer types like Int32, Int64, etc… was really a good choice, there’s no doubt about the type size so no need for a configure script to check a lot of things before compile. Now let’s wait to see if the future agrees with that decision.

2 Likes