Opinion on best practice for arrays/slices

This is a simple question, to get devs opinion on.

If I have an array/slice that is dynamically created in the program at runtime, and I know its size, is it better (faster, uses less memory, etc) to explicitly create the full size of the array/slice at once, and then explicitly index elements to store values (I know what the indices are)

arry = Array(Int32).new(size_value) 
....

arry[i] = data

or is it “better” to just init the array and then push data into the arry (again, I control the order so it will fill up correctly)?

arry = Array(Int32).new
....

arry << data

The latter takes less code, but I assume by telling the compiler to create the full array, memory will/can be allocated/accessed more efficiently, or does it really make any real difference in the real world?

You can just combine both approaches:

arry = Array(Int32).new(size_value)
arry << data

Btw. your former example doesn’t even work because size_value is just the capacity of the internal buffer and arry's size is 0, thus you can’t insert values with #[]=. You would need to add pass a default value to the constructor.

I guess that was confusing. size_value just meant the value length for the array.

If you’re interested in array-related performance comparisons, I have some examples here: ai4cr/array_bencher_spec.cr at master · drhuffman12/ai4cr · GitHub .

1 Like

If you’re doing it once per larger unit of work, creating an array is perfectly fine, but if you know the resulting size that the array will be, I would 100% pass that to Array.new because it’s effectively free performance.

To create an array of 7 elements, for example, you are cumulatively allocating enough elements in the underlying buffer to hold 21 elements:

array = Array(Int32).new # capacity = 0
array << 1 # ALLOCATION, capacity = 3, see https://github.com/crystal-lang/crystal/blob/master/src/array.cr#L2067-L2069
array << 2
array << 3
array << 4 # ALLOCATION, capacity = 6, previous 3 freed
array << 5
array << 6
array << 7 # ALLOCATION, capacity = 12, previous 6 freed

Since each of these allocations isn’t simply extending the original buffer but instead moving it to a whole new space in memory, so each reallocation of the underlying buffer is a complete representation of the array in memory.

You can see the impact with a quick benchmark:

Benchmark code here (only difference between them is the initial_capacity parameter)
require "benchmark"

array_size = ENV.fetch("ARRAY_SIZE", "7").to_i

array = [] of Int32
Benchmark.ips do |x|
  x.report "simple" do
    array = Array(Int32).new
    array_size.times { |i| array << i }
  end

  x.report "initialized to size" do
    array = Array(Int32).new(initial_capacity: 7)
    array_size.times { |i| array << i }
  end
end

pp array.size # Ensure LLVM doesn't optimize it out
$ ARRAY_SIZE=4 crystal run --release bench_array.cr
             simple  17.06M ( 58.62ns) (± 1.27%)  80.0B/op   1.54× slower
initialized to size  26.24M ( 38.11ns) (± 1.36%)  64.0B/op        fastest
4
$ ARRAY_SIZE=7 crystal run --release bench_array.cr
             simple  10.84M ( 92.29ns) (± 0.74%)   144B/op   2.04× slower
initialized to size  22.05M ( 45.35ns) (± 1.14%)  64.0B/op        fastest
7
$ ARRAY_SIZE=25 crystal run --release bench_array.cr
             simple   4.83M (207.12ns) (± 0.65%)  464B/op   1.39× slower
initialized to size   6.71M (149.00ns) (± 0.66%)  256B/op        fastest
25

If that code path is being hit less than about 1000x/sec, though, I wouldn’t worry too much about optimizing via other data structures. Arrays are super flexible and when you’re building something out. Even Slice will usually allocate on the heap unless you tell it to use stack memory that you allocated yourself. This is a lot of work to save a handful of nanoseconds, so if those nanoseconds aren’t adding up to seconds you may want to stick with the easy thing. :slightly_smiling_face:

2 Likes

Thanks @jgaskins for your detailed response. One reason I asked this is because I’ve seen different behavior in other languages.

It seems intuitive that if you allocate all the needed memory for the array at its creation there will be less work, as you show, to have to reallocate memory to increase the array when adding more elements. In my use though, it didn’t seem to make much differences in most cases, so you kind of confirm my intuition.

What I take from this then, from a strictly efficiency/performance perspective, it’s best to allocate the full array at creation if you know it, even if you use << to load it with data, otherwise it won’t make much difference in performance if you’re not filling the array with data too quickly.

1 Like