Is increasing array index size in the future?

bioinformatics user can use it happily, even it’s very slow.

If it’s really slow, I don’t think they’ll be very happy :sweat_smile:

A thing to note: Array is a very generic collection type which should work well for many purposes.
But it’s actually quite similar to Slice with the main difference that Array can grow dynamically. That makes it easy to use when you don’t know in advance how much space you need and it grows to fit. The way this mechanism is implemented doesn’t really work well for large sizes, though. Reallocating a buffer in the size of gigabytes is quite costly.
So dynamic allocations that big would better use a different approach anyway. (ref Is increasing array index size in the future? - #3 by RX14).

Slice should certainly support sizes beyond Int32::MAX.

5 Likes

If you override BigArray#each then there are no significant differences between the two:

class BigArray(T)
  def each(& : T ->)
    each_index64 do |i|
      yield unsafe_fetch(i)
    end
  end

  # `Indexable#each_index` yields `Int32`s instead
  def each_index64(& : Int64 ->) : Nil
    i = 0_i64
    while i < size
      yield i
      i += 1
    end
  end
end

My guess is that if i is still an Int32 then it could overflow before it reaches the higher indices, whereas if i and size have the same type then LLVM is smart enough to prove that the line i += 1 cannot overflow, and drop the overflow check on each iteration. (Note that simply replacing it with i &+= 1 is wrong as it’d end up with unsafe_fetch(Int32::MIN).)

4 Likes

Actually I wasn’t as rigorous with my language as I could have been.

For my use cases that need larger indexing, I am using Slices, because I know their sizes before their creation, and they never grow|shrink. I kind of forgot in Crystal Array are dynamic, so their capacity changes have to be managed.

So yes, I currently really need larger indices for Slices, but it would also be nice to have them for Arrays too, which would provide more flexibility in designing code.

Nice catch. I overrode each_index to yield Int64 values since Indexable methods can take any Int type and, sure enough, iteration is consistently 0.48% slower (very minor, but the consistency is notable) instead of 80-81%.

1 Like