BitArray size limitation

Currently bit array sizes are limited to 2**31 - 1 = 2147483647
I have an application needing > 2**32 bits, and it would be nice to use bit_array for it.

Please consider increasing the max limit, as I just upped my laptop memory to 40GB to handle it.

This has been discussed in the past. The proposed solution is that if you need it for 64 bits you copy the implementation and change Int32 to Int64. You could also write a shard for it. Something like BitArray64.

4 Likes

Is it possible to create 2 versions in the std library, so that its sufficiently maintained and tested for new Crystal versions?

As hardware memory capacities increase, and as Crystal will eventually accommodate 128-bit values, it seems the better thing to do, to maintain verified compatibility across the board, is to add this feature as part of the std library.

I strongly disagree with the implication that as hardware memory capabilities increase programming languages ought to assume that programs should use more memory by default. However, I think that whole discussion is tangential to the two main issues here.

First, what you’re requesting is a maintenance nightmare. The core development team and everyone else who contributes are making steady, admirable progress in a number of useful features, and trying to maintain a separate standard library with a different default integer type would be, in my opinion, a whole lot of wasted effort. If you insist on using BitArray, it might make sense to request a generic version that allows you to specify the size type. The implementation should be possible, though I expect it would require some slightly wonky macros.

Second, a BitArray becomes less and less useful for your prime sieve as the numbers you’re considering grow. Since the ratio of primes to non-primes goes to zero as you consider larger and larger numbers, you’re using less and less of your BitArray as the numbers grow. Consequently, you’d be served much better by a sparse implementation (which, crucially, is outside the scope of the standard library). I’m pretty sure I saw an interesting implementation of a sparse bit array/set/vector recently in another language, but for now you should take a look at this Java implementation of a sparse BitSet. I think you’ll find the included PDF very interesting as well. Obviously, you would need to reimplement it in Crystal in order to use it, but it’s a single file in which the majority of the lines are comments. If I can find the other thing I saw recently, I’ll add it to this thread so you can take a look at it.

My general point is that your use case is specialized, so the needs you have in terms of data structures are also specialized. The standard library is good for a lot of things, but it can’t be ideal for every possible use case.

EDIT: Upon looking further at the linked sparse bit set implementation, I don’t think it’s appropriate for your application. I still think that a sparse representation will serve you better, and I’ll still update this thread if I find that other thing I saw before.

EDIT 2: I found what I remembered seeing: Roaring Bitmaps. However, it’s not sparse, just compressed, and the documentation explicitly indicates that it’s inappropriate for sparse cases.

1 Like

I don’t think the proposal was to have two versions of stdlib, but two versions of BitArray (32 and 64 bit) in stdlib.
That should be reasonably maintainable, as well as a generic type.
But considering that this is a relatively rare subject, I think it would perfectly fit in a shard.

2 Likes

Ah, I see. I clearly misread the post originally. Sorry about that, @jzakiya.