The Crystal Programming Language Forum

Get offset of a Tuple item

Hi, I have an use case where I need to get the offset of a tuple item, that may be different depending on memory alignment, I can detail the use case if necessary.

offsetof pseudo-method nowadays accept just instance variables as second argument, so, my suggestion is to extend this and in case of Tuples, accept a Int, to get the offset of the nth item in the tuple.

Similar thing can be done for NamedTuple as well, accepting a symbol or a stringliteral.

Is this feasible? Is there already a way to get the offset of a tuple item and I don’t know?

By what I understood the magic happens only at def visit(node : OffsetOf) (crystal/main_visitor.cr at master · crystal-lang/crystal · GitHub) and parse_offsetof (crystal/parser.cr at master · crystal-lang/crystal · GitHub), am I right?

If you guys feel comfortable with the idea I can try to write a patch.

At least for tuples it sounds reasonable.

For NamedTuples I guess there might be some details regarding permutations of keys. A named tuple of type {a: Int32, b: String} can be used as a {b: String, a: Int32}. Either there is a cannonical representation or the compiler converts one to the other (I don’t recall). In the later escenario the offsets will not work, but the type restrictions might not complain.

Regarding the implementation places, yeah. It seems right.

If you need tuples I would say to start with those.

Thanks, I’ll try to write a patch for tuple next few days once I have some free time.

Patch done :smiley:

Should I create an issue for the PR? or just the PR itself is enough?

PR is fine :+1: There’s no need for a issue when it’s already as good as fixed =)

(On the off chance that the PR doesn’t work out, an issue can still be created when the PR is closed)

Could you detail your use case?

About my usecase:

I was writing a paged GapBuffer implementation as a experimentation to measure the work needed to write a text-editor widget for GTK that can load huge files (50M-100M, i.e. log files) very fast.

Then I was trying to avoid the memory copy at maximum and read the files in chunks of 8Kb on demand. So to avoid the overhead of string copying I did a non-safe hack to be able to create String objects that doesn’t copy the memory, the ugly hack is this:

class String
  enum Unsafe
    Yeah
  end

  def self.new(slice : Bytes, unsafe : Unsafe) : String
    str = GC.malloc_atomic(HEADER_SIZE + sizeof(UInt8*)).as(UInt8*)
    str_header = str.as({Int32, Int32, Int32, Int8, UInt8*}*)
    str_header.value = {TYPE_ID, slice.size, slice.size, -1_i8, slice.to_unsafe}
    str.as(String).initialize
    str.as(String)
  end

  def to_unsafe
    # offsetof tuple was needed here.
    @c == 255 ? (pointerof(@c) + 4).as(UInt8**).value : pointerof(@c)
  end
 
end

abc = "Abc"
slice = "Abc".to_slice
unsafe_str =  String.new(slice, String::Unsafe::Yeah)
puts unsafe_str
pp! abc
pp! abc.to_unsafe
pp! unsafe_str
pp! unsafe_str.to_unsafe

It still have the overhead of the malloc few bytes… and to_unsafe wont return a \0 ended string, but at least I can use all string methods and pass this pieces of strings to any place in stdlib that expects a string.

The hack works because by what I remember 255 isn’t a valid UTF-8 byte… I would store this info a the space we still have in String object due to memory alignment, but I remember to see some branch that used this extra space for something, so this way the hack is more safe for future internal changes of String object.

Ah, as this hack was an experiment, I didn’t bother with size vs bytesize differences.

So instead of that 4 you would use offsetof? But in this case the tuple was known to you, so using a hardcoded value is fine. Or just sizeof(Int32).

I don’t think we need to change the language or introduce this change for just this, specially since this is just an experiment. I can’t imagine any use case for this. In fact, this example isn’t a valid use case for this for me.

The reason is: this makes the language more complex but it’s probably not worth it.

But alignment can change from platform to platform, otherwise the offsetof wouldn’t exist, since people can always just to a sum with sizeof expressions…

Yes.

For a class it’s not clear what’s the order of instance variables. For a tuple it’s very clear. So I don’t feel there’s a need for this.

I’m maybe a bit biased, but still think it’s useful, less error prone than do the math by yourself and platform alignment independent.

Yes, but you have to weight that one-off thing that has a workaround against introducing a feature, documenting it and maintaining it.