I have been doing a few C extensions and have noticed some strings I have to use as(Pointer(UInt8)
. This is not too annoying and can extracted to method and stuff. But I was wondering why. This seems to be like an alias and I dont know why this is a problem. Seems like some of the C extensions in the lib return Char*
and work. Also looking into submitting a patch it looks like it would change some fundamental APIs. Would it be of benefit to more String
to use Char
instead of UInt8
? It seems like it is the fundamental unit that String is based on.
example: Carcin
p = Pointer(Char).new(3)
p[0] = 'a'
p[1] = '\0'
p[2] = 'b'
s = String.new(p, 3)
I’m not sure I’m reading your comment correctly, but it seems you’re confusing Crystal’s Char
type with C’s char
. They are called the same, but are fundamentally different types. The latter represents an 8 bit character. It is mostly equivalent to Crystal’s UInt8
(alias LibC::Char = UInt8
). Char
is 4 byte wide and represents a Unicode codepoint in UTF-32 encoding.
While a string consists of characters, String
uses UTF-8 encoding for space efficiency. That means 8 bit per character, which means the data format is compatible to a char
pointer in C.
Bottom line: The Char
type is equivalent to C’s unsigned long
. I don’t think there are any C APIs that use that to represent strings. So you should never need Pointer(Char)
for library bindings.
Yep, that’s already String#to_unsafe
. It applies implicitly when passing String
to a lib function.
1 Like
That makes a lot of sense. I was mistaken and I think the compiler was saving me from myself. I will have to look into my C lib to make sure I am doing everything correctly.
The C/C++ equivalent of Crystal Char
is wchar_t.
A wchar_t
is not guaranteed to hold a Unicode code point, only a code unit of some encoding, which happens to be UTF-16 on Windows. Thus char32_t
is probably closer. (This distinction is important because Crystal’s Win32 bindings use LibC::WCHAR
everywhere.)
3 Likes
exactly, on most places (am I just thinking about Linux?) it will be 32bits, but it maybe not… One thing that I really dislike in the C and C++ standards is that they do not define anything exactly, there’s too much of depends on the platform
, I understand that the platform spectrum for C language is really huge, but even so…
I think the Crystal idea to name the integer types like Int32, Int64, etc… was really a good choice, there’s no doubt about the type size so no need for a configure script to check a lot of things before compile. Now let’s wait to see if the future agrees with that decision.
2 Likes