bioinformatics user can use it happily, even it’s very slow.
If it’s really slow, I don’t think they’ll be very happy
bioinformatics user can use it happily, even it’s very slow.
If it’s really slow, I don’t think they’ll be very happy
A thing to note: Array
is a very generic collection type which should work well for many purposes.
But it’s actually quite similar to Slice
with the main difference that Array
can grow dynamically. That makes it easy to use when you don’t know in advance how much space you need and it grows to fit. The way this mechanism is implemented doesn’t really work well for large sizes, though. Reallocating a buffer in the size of gigabytes is quite costly.
So dynamic allocations that big would better use a different approach anyway. (ref Is increasing array index size in the future? - #3 by RX14).
Slice
should certainly support sizes beyond Int32::MAX
.
If you override BigArray#each
then there are no significant differences between the two:
class BigArray(T)
def each(& : T ->)
each_index64 do |i|
yield unsafe_fetch(i)
end
end
# `Indexable#each_index` yields `Int32`s instead
def each_index64(& : Int64 ->) : Nil
i = 0_i64
while i < size
yield i
i += 1
end
end
end
My guess is that if i
is still an Int32
then it could overflow before it reaches the higher indices, whereas if i
and size
have the same type then LLVM is smart enough to prove that the line i += 1
cannot overflow, and drop the overflow check on each iteration. (Note that simply replacing it with i &+= 1
is wrong as it’d end up with unsafe_fetch(Int32::MIN)
.)
Actually I wasn’t as rigorous with my language as I could have been.
For my use cases that need larger indexing, I am using Slices
, because I know their sizes before their creation, and they never grow|shrink. I kind of forgot in Crystal Array
are dynamic, so their capacity changes have to be managed.
So yes, I currently really need larger indices for Slices
, but it would also be nice to have them for Arrays
too, which would provide more flexibility in designing code.
Nice catch. I overrode each_index
to yield Int64
values since Indexable
methods can take any Int
type and, sure enough, iteration is consistently 0.48% slower (very minor, but the consistency is notable) instead of 80-81%.