[RFC] Automatic conversion of wrapper structs to C

While reading this blogpost that talks about the journey of adding Z3 bindings to Crystal I was going to suggest the author to start using to_unsafe in their code so they don’t have to manually convert things from Crystal to C.

But then I realized that wouldn’t always work… I’ll try to explain why.

In Z3 there’s an opaque type LibZ3::Ast = Void* type. Then we can define a wrapper for it:

module Z3
  struct IntExpr
    def initialize(expr : LibZ3::Ast)
      @expr = expr
    end
  end
end

By defining to_unsafe in that struct we can directly pass such instances to C. So far, so good!

However, in some places the C API accepts a pointer of LibZ3::Ast together with a length: the usual way one passes lists of things to C.

If in Crystal you are working with the high-level interface, you are working with Z3::IntExpr instances. If you have a bunch of these, you will have Array(Z3::IntExpr). Then if you want to pass that array to C, you have to do something like:

array.map(&.to_unsafe)

That is, go from Array(Z3::IntExpr) to Array(LibZ3::Ast). Because there’s Array#to_unsafe, that can be automatically passed to C as the to_unsafe method of this array will be Pointer(LibZ3::Ast).

So with all of this I had some observations:

  • That map(&.to_unsafe) call will allocate a new array, but the byte contents of the array contents will be exactly the same. After all a wrapper struct holds the same bytes as the wrapped struct. So first it’s inefficient to do so. In fact in the compiler’s code we do something different, something like array.to_unsafe.as(LibZ3::Ast*). That is, we assume the to_unsafe call has a pointer that points to the same memory layout we need.
  • It seems wrapper structs are very common. We can have a to_unsafe, but why not make it simpler?

So here’s the final idea:

  • If you define a Crystal struct, and that struct contains exactly one instance variable, and that variable’s type matches the C type, just let it pass, without any extra work. Let’s call this a “wrapper struct”.
  • If you have a pointer of a wrapper struct, and the C function expects a pointer of the wrapped struct, just let it pass: the memory layout behind the pointer will match
  • If you have a type that doesn’t match the C function argument, try to call to_unsafe on it. And here apply the same rules again: if that type matches, it’s all good. And it’s also good if the to_unsafe call returns a pointer of a wrapper struct, and the C function expects a pointer of a wrapped struct

I created a draft PR for this. It was very easy to implement! And I tried this in the compiler’s code, and also in the Z3 bindings that are being created. The code ends up being so much simpler.

Is there a catch? Like everything, yes. I only see one downside of this: it’s implicit behavior. So if you have a struct that wraps a C value, but then you also have some other instance vars, it won’t work out of the box. You will have to have a to_unsafe method. And of course arrays of these types won’t work. Adding other instance vars can also be done by reopening a type, so your code might stop compiling as soon as you do that.

BUT. I think in most cases:

  • wrapper structs just wrap one value. It’s uncommon to also include more data
  • reopening wrapper structs is very, very uncommon. I’d say it simply doesn’t happen. Nobody wants to mess with a type that wraps a C type. Nobody wants to mess with C :-D
  • you can always define a to_unsafe if the implicit wrapper struct doesn’t work for you

What do you think?

I sent a few different proposals in the past, but I feel this proposal actually adds value to the language, by simplifying interfacing with C, and also by ending up with more efficient code (no need to map(&.to_unsafe))

6 Likes

Regarding the catch, one way to improve this would be to add a new annotation, let’s say @[WrapperStruct]. If you put it on a type, then all of the following must be true:

  • The type must be a struct
  • The type must have exactly one instance variable

That way if you or someone reopens that type and adds a new instance variable, you will get an error that says “Type X is a wrapper struct so it can only contain exactly one instance variable, no more” instead of getting an error at a random call site.

That said, I’m fine with or without that annotation.

3 Likes

In gtk4 bindings I need to use map(&.to_unsafe)) too, however most of the types are like:

class Label
  @pointer : Pointer(Void)
   ...
  def to_unsafe
    @pointer
  end
end

I use a void pointer to simplify the generator, but this is a pointer to a struct allocated by GLib that need to be freed on finalize, what also explain why the wrapper is a class, not a struct. However in some cases where the C type is meant to live in the stack the wrapper is a struct.

I like the idea, but maybe it’s not generic enough and could end up being a dark feature like the ruby flipflop operator :thinking:

I’ve never had to do .map(&.to_unsafe) so I wont comment on that case. I agree that the code in the z3 bindings would be better if the author had some understanding of how to leverage to_unsafe, but I’m a bit skeptical of adding magic to avoid the need to know of it in some cases but not all.

Partly because I am not certain I see why structs are different from regular objects when it comes to interfacing to C and partly because it is such a special case - while I haven’t bound all that many C libraries the only one which would get a free to_unsafe defined in one of its bindings wouldn’t be affected in the first place as it is never passed around. So I have to wonder: Is this solving an actual problem?

That is, we assume the to_unsafe call has a pointer that points to the same memory layout we need.

I’m thinking that people that interface directly with C bindings kinda need to understand the memory layout anyhow. Especially people that directly share crystal data structures with C code.

I’m wondering if to_unsafe could be made easier to find though. Right now the only place I see it referenced is the special header under fun in the book. If you don’t read that particular header you won’t know about it. Perhaps some of the examples of other things related to C bindings should make use of it with a reference to the header as well?

1 Like