Get offset of a Tuple item

hugopl · January 7, 2021, 3:48pm

Hi, I have an use case where I need to get the offset of a tuple item, that may be different depending on memory alignment, I can detail the use case if necessary.

offsetof pseudo-method nowadays accept just instance variables as second argument, so, my suggestion is to extend this and in case of Tuples, accept a Int, to get the offset of the nth item in the tuple.

Similar thing can be done for NamedTuple as well, accepting a symbol or a stringliteral.

Is this feasible? Is there already a way to get the offset of a tuple item and I don’t know?

By what I understood the magic happens only at def visit(node : OffsetOf) (crystal/main_visitor.cr at master · crystal-lang/crystal · GitHub) and parse_offsetof (crystal/parser.cr at master · crystal-lang/crystal · GitHub), am I right?

If you guys feel comfortable with the idea I can try to write a patch.

bcardiff · January 7, 2021, 5:19pm

At least for tuples it sounds reasonable.

For NamedTuples I guess there might be some details regarding permutations of keys. A named tuple of type {a: Int32, b: String} can be used as a {b: String, a: Int32}. Either there is a cannonical representation or the compiler converts one to the other (I don’t recall). In the later escenario the offsets will not work, but the type restrictions might not complain.

Regarding the implementation places, yeah. It seems right.

If you need tuples I would say to start with those.

hugopl · January 7, 2021, 6:33pm

Thanks, I’ll try to write a patch for tuple next few days once I have some free time.

hugopl · January 7, 2021, 8:58pm

Patch done

Should I create an issue for the PR? or just the PR itself is enough?

straight-shoota · January 8, 2021, 12:18am

PR is fine There’s no need for a issue when it’s already as good as fixed =)

(On the off chance that the PR doesn’t work out, an issue can still be created when the PR is closed)

asterite · January 13, 2021, 11:31pm

Could you detail your use case?

hugopl · January 14, 2021, 2:44pm

About my usecase:

I was writing a paged GapBuffer implementation as a experimentation to measure the work needed to write a text-editor widget for GTK that can load huge files (50M-100M, i.e. log files) very fast.

Then I was trying to avoid the memory copy at maximum and read the files in chunks of 8Kb on demand. So to avoid the overhead of string copying I did a non-safe hack to be able to create String objects that doesn’t copy the memory, the ugly hack is this:

class String
  enum Unsafe
    Yeah
  end

  def self.new(slice : Bytes, unsafe : Unsafe) : String
    str = GC.malloc_atomic(HEADER_SIZE + sizeof(UInt8*)).as(UInt8*)
    str_header = str.as({Int32, Int32, Int32, Int8, UInt8*}*)
    str_header.value = {TYPE_ID, slice.size, slice.size, -1_i8, slice.to_unsafe}
    str.as(String).initialize
    str.as(String)
  end

  def to_unsafe
    # offsetof tuple was needed here.
    @c == 255 ? (pointerof(@c) + 4).as(UInt8**).value : pointerof(@c)
  end
 
end

abc = "Abc"
slice = "Abc".to_slice
unsafe_str =  String.new(slice, String::Unsafe::Yeah)
puts unsafe_str
pp! abc
pp! abc.to_unsafe
pp! unsafe_str
pp! unsafe_str.to_unsafe

It still have the overhead of the malloc few bytes… and to_unsafe wont return a \0 ended string, but at least I can use all string methods and pass this pieces of strings to any place in stdlib that expects a string.

The hack works because by what I remember 255 isn’t a valid UTF-8 byte… I would store this info a the space we still have in String object due to memory alignment, but I remember to see some branch that used this extra space for something, so this way the hack is more safe for future internal changes of String object.

hugopl · January 14, 2021, 2:59pm

Ah, as this hack was an experiment, I didn’t bother with size vs bytesize differences.

asterite · January 14, 2021, 3:05pm

So instead of that 4 you would use offsetof? But in this case the tuple was known to you, so using a hardcoded value is fine. Or just sizeof(Int32).

I don’t think we need to change the language or introduce this change for just this, specially since this is just an experiment. I can’t imagine any use case for this. In fact, this example isn’t a valid use case for this for me.

The reason is: this makes the language more complex but it’s probably not worth it.

hugopl · January 14, 2021, 5:05pm

But alignment can change from platform to platform, otherwise the offsetof wouldn’t exist, since people can always just to a sum with sizeof expressions…

hugopl · January 14, 2021, 5:07pm

Yes.

asterite · January 14, 2021, 5:17pm

For a class it’s not clear what’s the order of instance variables. For a tuple it’s very clear. So I don’t feel there’s a need for this.

hugopl · January 14, 2021, 5:25pm

I’m maybe a bit biased, but still think it’s useful, less error prone than do the math by yourself and platform alignment independent.

asterite · January 14, 2021, 5:43pm

Yes, but you have to weight that one-off thing that has a workaround against introducing a feature, documenting it and maintaining it.

Topic		Replies	Views
How to iterate over the tuple values and get both the key and the value? Help & Support	1	260	November 16, 2023
How to sort an Array of NamedTuples? Help & Support	4	477	January 31, 2020
Some question(or improvement?) for Crystal Crystal Contrib	9	605	May 12, 2022
Feature Request: autocasting NamedTuples and Tuples to types Community	3	285	August 19, 2022
A little issue encountered in union type reference for NamedTuple Help & Support	8	285	July 23, 2021

Get offset of a Tuple item

Related topics