BitArray size limitation

jzakiya · July 14, 2022, 2:41am

Currently bit array sizes are limited to 2**31 - 1 = 2147483647
I have an application needing > 2**32 bits, and it would be nice to use bit_array for it.

Please consider increasing the max limit, as I just upped my laptop memory to 40GB to handle it.

github.com

crystal-lang/crystal/blob/994c70b10/src/bit_array.cr#L18


      
          #
          # ```
          # require "bit_array"
          #
          # ba = BitArray.new(12) # => "BitArray[000000000000]"
          # ba[2]                 # => false
          # 0.upto(5) { |i| ba[i * 2] = true }
          # ba    # => "BitArray[101010101010]"
          # ba[2] # => true
          # ```
          struct BitArray
            include Indexable::Mutable(Bool)
          
          
  # The number of bits the BitArray stores
            getter size : Int32
          
          
  # Creates a new `BitArray` of *size* bits.
            #
            # *initial* optionally sets the starting value, `true` or `false`, for all bits
            # in the array.
            def initialize(size : Int, initial : Bool = false)

asterite · July 14, 2022, 12:16pm

This has been discussed in the past. The proposed solution is that if you need it for 64 bits you copy the implementation and change Int32 to Int64. You could also write a shard for it. Something like BitArray64.

jzakiya · July 15, 2022, 2:55pm

Is it possible to create 2 versions in the std library, so that its sufficiently maintained and tested for new Crystal versions?

As hardware memory capacities increase, and as Crystal will eventually accommodate 128-bit values, it seems the better thing to do, to maintain verified compatibility across the board, is to add this feature as part of the std library.

RespiteSage · July 15, 2022, 4:27pm

I strongly disagree with the implication that as hardware memory capabilities increase programming languages ought to assume that programs should use more memory by default. However, I think that whole discussion is tangential to the two main issues here.

First, what you’re requesting is a maintenance nightmare. The core development team and everyone else who contributes are making steady, admirable progress in a number of useful features, and trying to maintain a separate standard library with a different default integer type would be, in my opinion, a whole lot of wasted effort. If you insist on using BitArray, it might make sense to request a generic version that allows you to specify the size type. The implementation should be possible, though I expect it would require some slightly wonky macros.

Second, a BitArray becomes less and less useful for your prime sieve as the numbers you’re considering grow. Since the ratio of primes to non-primes goes to zero as you consider larger and larger numbers, you’re using less and less of your BitArray as the numbers grow. Consequently, you’d be served much better by a sparse implementation (which, crucially, is outside the scope of the standard library). I’m pretty sure I saw an interesting implementation of a sparse bit array/set/vector recently in another language, but for now you should take a look at this Java implementation of a sparse BitSet. I think you’ll find the included PDF very interesting as well. Obviously, you would need to reimplement it in Crystal in order to use it, but it’s a single file in which the majority of the lines are comments. If I can find the other thing I saw recently, I’ll add it to this thread so you can take a look at it.

My general point is that your use case is specialized, so the needs you have in terms of data structures are also specialized. The standard library is good for a lot of things, but it can’t be ideal for every possible use case.

EDIT: Upon looking further at the linked sparse bit set implementation, I don’t think it’s appropriate for your application. I still think that a sparse representation will serve you better, and I’ll still update this thread if I find that other thing I saw before.

EDIT 2: I found what I remembered seeing: Roaring Bitmaps. However, it’s not sparse, just compressed, and the documentation explicitly indicates that it’s inappropriate for sparse cases.

straight-shoota · July 15, 2022, 7:53pm

I don’t think the proposal was to have two versions of stdlib, but two versions of BitArray (32 and 64 bit) in stdlib.
That should be reasonably maintainable, as well as a generic type.
But considering that this is a relatively rare subject, I think it would perfectly fit in a shard.

RespiteSage · July 15, 2022, 8:06pm

Ah, I see. I clearly misread the post originally. Sorry about that, @jzakiya.

Topic		Replies	Views
Int32 limit for the number of Array elements Help & Support	2	653	August 2, 2020
Proposal: Support for arbitrary sized integers Crystal Contrib	5	572	February 25, 2021
Small Array Help & Support	1	230	July 16, 2021
Is increasing array index size in the future? Crystal Contrib	33	638	March 1, 2025
Technical question about array of bools Help & Support	7	126	February 23, 2025

BitArray size limitation

Related topics