Is increasing array index size in the future?

HertzDevil · October 12, 2024, 10:08am

The sequential access itself is fast, but this and the intermittent array reallocations keep invalidating your entire L3 cache, hence the erratic run times (don’t forget that all other processes on your system are also competing for the cache). The L3 cache size is most certainly well below Int32::MAX, so Int32 and Int64 shouldn’t make a big difference in that benchmark.

One important perspective here is that 32-bit linear memory was rare when C/C++ originally existed, so it is important that size_t could be smaller than 32 bits, so as not to waste any upper bits. It is not a suggestion that making it 64 bits on a 64-bit system would make the same data structures or algorithms equally fast. This is actually a problem for AVR as well if one tries to use Array directly there; Arrsy#size won’t become an Int64, and if it does become a platform-dependent integer type, it would be due to support for legacy / embedded targets, rather than performance.

There is also the problem that the larger a piece of contiguous allocation is, the more false positive references to it the Boehm GC will identify, and your block of memory will reside longer than necessary. Already the standard library specs allocate several 1 GiB buffers and often they simply don’t go away before the whole suite finishes.

A genuine application for 64-bit sizes is when dealing with memory-mapped I/O, which doesn’t involve allocations, and the best way to represent the data is via some kind of Slice with a 64-bit size field. See also: `File.read`ing a file bigger than 2GB results in an `OverflowError` · Issue #14485 · crystal-lang/crystal · GitHub