Is increasing array index size in the future?

jzakiya · February 17, 2024, 7:46pm

Currently the index of arrays is maxed at Int32::MAX, i.e. (2**31 - 1) => 2147483647

Are there plans to increase that to at least Int64::MAX?
I seriously need to create|address larger arrays.

Blacksmoke16 · February 18, 2024, 4:11pm

Related: Arrays have a 'silent' length limit due to Int32 type on size. · Issue #4011 · crystal-lang/crystal · GitHub.

RX14 · February 19, 2024, 9:06am

I think any addition of larger array would be it’s own array type, optimized to not create the kind of massive contiguous allocations required by an array larger than u32_max. What’s the usecase you’re running into here @jzakiya?

jzakiya · February 19, 2024, 10:22pm

I want to genrate as many prime numbers as possilbe.

The program below will generate primes until you run out of memory.

def sozp240(val, residues)
  res  = residues   
  md, rscnt = res.last-1, res.size  

  kmax = (val - 2) // md + 1  
  prms = Array(UInt64).new(kmax, 0) 
  modk, r, k = 0, -1, 0  

  loop do 
    if (r += 1) == rscnt; r = 0; modk += md; k += 1 end
    next if prms[k] & (1u64 << r) != 0   
    prm_r = res[r]         
    prime = modk + prm_r  
    break if prime > Math.isqrt(val)  
    res.each do |ri|   
      kn,rn = (prm_r * ri - 2).divmod md  
      bit_r = 1u64 << res.index(rn+2).not_nil! 
      kpm = k * (prime + ri) + kn  
      while kpm < kmax; prms[kpm] |= bit_r; kpm += prime end
  end end
  prms
end

The program encodes the primes within the primes candidates array prms.
I extract the primes individually from prms to use in another routine that uses them.

Currently I’m array size limited to only doing 19 Billion primes (~45 minutes with single core).

Max array index is: (2**31 - 1) = 2,147,483,647 elements.

The 19th Bil prime is 491_701_819_267.

491_701_819_267 // 240 = 2,048,757,581 residue groups of 64-bits.

Thats 2_048_757_581 * 8 = 16,390,060,648 bytes (16.39 GB) of memory.

I have 40GB on my laptop, so I can|want to do a lot more.

40th Billion prime is 1,066,173,339,601

That would take an array of: 1,066,173,339,601 // 240 = 4,442,388,915 resgroups (index values)
That’s 35,539,111,320 bytes (35.5GB) of memory.
I can easily do this, and find max possible with my memory.

That’s why I need to be able to create|address larger indexed arrays.

zw963 · October 11, 2024, 3:15pm

The same issue exists on String, it size limit to Int32::MAX

kojix2 · October 11, 2024, 3:48pm

In bioinformatics, there are many software programs that can only run on computers with 64GB or 128GB of memory. For example, programs such as STAR and Kraken, and several others. Most of these programs load large dictionary-like data into memory.

That’s why my home computer also has 64GB or 128GB of memory. Because I want to try such software at home. I buy memory and build my own PCs to keep costs down, but memory is easily corrupted. For example, if the result of md5sum changes every time I check it, I can run MemTest86+ to find out that the memory is corrupted. Is it because I always buy cheap memory?

Of course, I don’t have the skills to write such complex software myself, but after reading jzakiya’s comment, I realized that arrays in Int32::MAX are not as large as I thought. I agree that Crystal’s array and string sizes need to be expanded, and of course I know that Crystal is an OSS project and contributions are important.

zw963 · October 11, 2024, 5:37pm

as this languages string benchmark , Crystal only slower than Rust, and ranking second on 500M and 1G text test, however, it failed for larger text files test.

If Crystal support big string, will have a more impressive result.

As language competitors, Rust can support large strings until 29.2 GB, even, Go can support large strings until 17.1 GB.

Devonte · October 11, 2024, 6:13pm

That benchmark was proven flawed in this forum post so I don’t think it’s worth considering for performance.

I second RX14’s suggestion of a separate type, largely because the chances of people/business working with such large arrays is so uncommon that it could probably be implemented as a shard, compared to the String limit which is surprisingly more common.

jgaskins · October 11, 2024, 7:18pm

IIRC someone on the core team said Int32 was used for indexing to reduce the memory footprint of these data structures. That makes sense considering how ubiquitous they are. Increasing indexable size of an Array from 32->64 bits would increase the baseline footprint of every array in every program from 16->24 bytes (8B pointer + 4B capacity + 4B size → 8B pointer + 8B capacity + 8B size). General-purpose data structures should be optimized for the general case, and N is small in the vast majority of use cases.

There’s nothing stopping anybody from making a BigArray or BigString class, though. If you just need an array type for common operations like enumeration, indexing, and mutation-by-index, you can use this shard. The implementation is quite simple and most of the behavior lives in Indexable and Indexable::Mutable mixins.

When you think about how SQL databases are often split between OLTP and OLAP use cases (OLAP DBs scale horizontally for a single query but have a large up-front query cost whereas OLTP databases have a trivial up-front cost but are very limited in how they can parallelize a single query), it makes sense to also optimize data structures for how much data they’re expected to hold and how they’re intended to be used.

But it’s also possible that Array might perform better on 64-bit systems with 64-bit indexes considering some arithmetic operations are faster on 64-bit values and a lot of operations are performed on the index. Hard to predict, though, because it would involve larger allocations. Benchmarks would need to be run.

jzakiya · October 11, 2024, 9:17pm

This seems to be a very simple thing to do that is being rationalized into being hard.

Almost every desktop, laptop, phone, RPI, tablet, etc, use 64-bit systems|hardware, thus their native internal registers and memory are laid out to be optimized for operations at their full bit size.

In fact, the 32-bit indexing limit is archaically out of date for hardware optimization, and application necessities.

Allowing for 64-bit indexing doesn’t mean you are going to use the full 2^64 memory addresses, it means you can get to address upper GB size arrays, which already have 64-bit values.

I can’t use Crystal in some of my applications because of this archaic limitation.
32-bits is just 2+ billion array elements. That just doesn’t cut it in 2024 and beyond. I’m forced to use Rust, et al to even think about doing certain applications.

Having 64-bit indexing should be a natural part of the standard system library, and not some external feature you need to install a shard for.

This reminds me of Bill Gates saying 640Kb should be enough for any application.

Devonte · October 11, 2024, 10:23pm

Do note that most languages, notably Go, C#, Java and Rust as you’ve mentioned, do not support more than 2^31-1 for arrays by default. In particular with Rust, this comment clearly states the limitations of such an array/vector. So its not unusual that Crystal follows suit. As previously mentioned, it’s optimised for the general use case.

kojix2 · October 12, 2024, 12:07am

Crystal is always improving, and now that it supports Int128, it makes writing bioinformatics software a little easier.
The amount of data we work with is growing every year, and it’s clear that we’ll need to overcome the size limit of Int32::MAX.
I think it’s simply a matter of development resources and priorities. I expect Crystal and other languages to have some way of handling large strings and arrays in their standard libraries within the next decade.

jgaskins · October 12, 2024, 12:26am

I think this exact same thing a lot, and it’s surprising how often the reason for it is that circumstances exist that I’m unaware of that make it more complicated.

You and I have both mentioned in this threa that 64-bit CPUs are optimized to deal with 64-bit values, however that doesn’t mean all 64-bit operations are faster every time. I just wrote up some benchmarks and, on my machine, the BigArray class I wrote that can index larger than 32-bit offsets is 20% slower to allocate and 81% slower to iterate through 1k of the exact same elements. I assumed allocations would be slower, but iteration being slower surprised me considering that both of them are doing the exact same pointer arithmetic on 32- vs 64-bit offsets.

Benchmark code

I’m on an ARM64 CPU, so maybe it’ll be different on a different CPU architecture (or even other ARM64 CPUs that are not made by Apple), but this is why I said in my previous post that benchmarks would be important here. Performance is never as straightforward as people think it is. You have to see it running.

require "benchmark"
require "../src/big_array"

puts "Allocation"
values = [Array(String).new, BigArray(String).new]
Benchmark.ips do |x|
  x.report "Array" { values.unsafe_put 0, Array(String).new }
  x.report "BigArray" { values.unsafe_put 1, BigArray(String).new }
end
# Need a side effect to avoid LLVM optimizations, but we don't actually want it to print
pp values unless values

puts
puts "Indexing"
value = ""
Benchmark.ips do |x|
  array = Array(String).new
  big_array = BigArray(String).new
  1_000.times do |i|
    array << i.to_s
    big_array << i.to_s
  end

  x.report "Array" { array.each { |v| value = v } }
  x.report "BigArray" { big_array.each { |v| value = v } }
end
pp value unless value

I’m on an ARM64 CPU, so maybe a different CPU architecture (or even an ARM64 CPU not made by Apple) might get different results, but this is why I said in my previous post that benchmarks would be important here. Performance is never straightforward.

If you don’t want to use a shard for it, that’s an understandable opinion to have, but sensationalizing your position by saying you’re being “forced” to use another language after I just gave you the blueprint to do what you want in Crystal is not as strong an argument as you might think it is.

zw963 · October 12, 2024, 3:34am

I consider it’s useful to add a BigArray or BigString shard to verify it.

zw963 · October 12, 2024, 3:46am

I want say this discussion is about can do or cannot do, yes, above benchmark is not so meaningful, but, as a competitors, why Go can do it, but Crystal can’t? this is the key point.

As said in discussion, We can add a BigArray or BigString to avoid potential impact on performance, it better than without it, bioinformatics user can use it happily, even it’s very slow.

chroma9129 · October 12, 2024, 6:35am

An Int64-based Array seems like the “general” case as it handles more data without error, whereas the Int32-based Array feels like the “specific” case: reduce capacity, breaking some use cases, for a performance boost.

“popular” case is separate from “general” and “specific”. Either could be the more popular choice at a point in time, and that could change over time.

Anyway, I was curious about benchmarks with C++. I created basic Vector32 type and Vector64 types (the usual (data, size, capacity) triplet). One using int32_t and the other int64_t for size, capacity and indexing. Both holding a T = struct Point { double x, y; }.

Iteration of either using pointer increment was the same.

Iteration with at(i) where i was int32_t or int64_t, matching the vector size, was the same.

But when I say “the same”, they varied wildly per run, outperforming each other back and forth between 1% and 50%. Could be CPUs hitting sustained throughput and/or throttling. I’m on an Apple M2 Max notebook with all apps quit except terminal.

My conclusion is that 32-bit is not a speed advantage on a 64-bit Apple ARM chip–in the C++ ecosystem.

Here is the heart of the benchmark:

template<typename VectorType>
double benchmark(const char* vector_name, int size) {
    VectorType vector;
    for (int64_t i = 0; i < size; ++i) {
        vector.push(Point{static_cast<double>(i), static_cast<double>(i * 2)});
    }

    auto start = std::chrono::high_resolution_clock::now();

	using index_t = VectorType::index_t;
	double last = 0.0;
	for (index_t i = 0; i < vector.size(); ++i) {
		last = vector.at(i).x;
	}

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start);

    std::cout << std::format("{} iteration time: {} microseconds\n", vector_name, duration.count());
    std::cout << std::format("Last: {} (to prevent optimization)\n", last);

    return duration.count();
}

chroma9129 · October 12, 2024, 6:36am

Could there be another source for the difference, like built-in Array was compiled with --release and BigArray was not?

jgaskins · October 12, 2024, 8:15am

It was in the same program. As far as I’m aware, it’s not possible to split compilation like that in Crystal. The benchmark code is in the post (just gotta pop open the details element) if you’d like to check my work.

jgaskins · October 12, 2024, 8:46am

Strong disagree. Array sizes over 2B elements are rare. I’ve been programming for over 30 years and I’ve never needed that many elements in a single in-memory data structure. To be clear, I have no doubt that those use cases exist, but a general-purpose data structure should be optimized for the general case and N is usually small.

While this is correct, unfortunately it isn’t relevant to the discussion. Arrays are used for a broad spectrum of things in Crystal programs and I’d bet my house that >99% of those use cases never exceed 1M elements. When I say “generally”, at the very least, I mean the vast majority of ways arrays are used rather than a few use cases that many people have.

HertzDevil · October 12, 2024, 10:08am

The sequential access itself is fast, but this and the intermittent array reallocations keep invalidating your entire L3 cache, hence the erratic run times (don’t forget that all other processes on your system are also competing for the cache). The L3 cache size is most certainly well below Int32::MAX, so Int32 and Int64 shouldn’t make a big difference in that benchmark.

One important perspective here is that 32-bit linear memory was rare when C/C++ originally existed, so it is important that size_t could be smaller than 32 bits, so as not to waste any upper bits. It is not a suggestion that making it 64 bits on a 64-bit system would make the same data structures or algorithms equally fast. This is actually a problem for AVR as well if one tries to use Array directly there; Arrsy#size won’t become an Int64, and if it does become a platform-dependent integer type, it would be due to support for legacy / embedded targets, rather than performance.

There is also the problem that the larger a piece of contiguous allocation is, the more false positive references to it the Boehm GC will identify, and your block of memory will reside longer than necessary. Already the standard library specs allocate several 1 GiB buffers and often they simply don’t go away before the whole suite finishes.

A genuine application for 64-bit sizes is when dealing with memory-mapped I/O, which doesn’t involve allocations, and the best way to represent the data is via some kind of Slice with a 64-bit size field. See also: `File.read`ing a file bigger than 2GB results in an `OverflowError` · Issue #14485 · crystal-lang/crystal · GitHub

Topic		Replies	Views
Int32 limit for the number of Array elements Help & Support	2	655	August 2, 2020
BitArray size limitation Help & Support	5	381	July 15, 2022
Maximum Hash size reached - so low Help & Support	5	350	July 24, 2022
Increasing Floating Point Digits of Precision Help & Support	1	135	February 7, 2024
Kudos to Crystal! News	12	2153	December 29, 2021

Is increasing array index size in the future?

Related topics