Array.combinations issues

I can run this code in Ruby (MRI, Truffleruby, JRuby) and get immediate results.

(1..60).to_a.combination(53).size # => 386206920

When I run the equivalent in Crystal (1.1.0) it eats up my 16GB of memory and gives
Program was killed

Crystal code
puts (1..60).to_a.combinations(53).size

If I use smaller values it will produce correct results.

puts (1..20).to_a.combination(13).size # => 77520

1 Like

You can get the correct result using an iterator instead of creating an array that has loads and loads of elements.

puts (1..60).to_a.each_combination(53).size # => 386206920

It still takes quite a while (83s) but doesn’t use much memory, with the max being less than 1mb. Maybe there’s room for some optimization there.

1 Like

That’s really interesting because the code for combinations uses each_combination.

Also, the Ruby version is lazy, which I suspect allows it to just accumulate each combination without eating up memory.

Yes exactly. The Crystal code I provided is also lazy which is why it doesn’t use much memory at all.

A common convention in Crystal is that you can get an iterator by invoking a method that normally takes a block, but without passing one. I.e. the version of each_combination in your link is using a block so it’s not the iterator version of it. While calling it as I did w/o the block returns an iterator.

Reference: Bottom of the class docs of Iterator(T) - Crystal 1.1.0

The difference compared to Ruby is that for this snippet, all I want is the number of different combinations, and not their values, but to get that Crystal stores the individual values, and Ruby doesn’t. I tried to see if I could get around having to store arrays of combinations but haven’t seen how to do that with existing methods.

Is that not just this?

It looks like the issue here is simply that the iterator implementation (Indexable::CombinationIterator) does not have a dedicated #size method which could give you the result right away. Instead, it falls back to Enumerable#size which iterates through all items (count { true }).

With a direct implementation of #size this works really well and fast:

p! (1..60).to_a.each_combination(53).size # => 386206920

module Indexable
  class CombinationIterator
    def size
      binomial_coefficient(@n, @size)
    end

    def binomial_coefficient(n, k)
      return 0 if k > n
      return 1 if k == 0 || k == n

      return binomial_coefficient(n - 1, k - 1) + binomial_coefficient(n - 1, k)
    end
  end
end

Losely related issues (why this implementation is not enforced):

5 Likes

Thank you @straight-shoota!

Using your code, I created a build .. --release binary which took 1.7 secs.

Still not a fast as Ruby, but sooooo much better than before.

I’m glad to see this|these issues have been raised before and hopefully can be greatly improved soon.

FYI. I was testing 1.1.0 on some Rosetta Code examples.
The combination|permutation code here is exponentially faster.

https://rosettacode.org/wiki/Combinations_and_permutations#Crystal

I was trying to see if that Ruby code snippet worked now in Crystal.

https://rosettacode.org/wiki/Combinations_and_permutations#Ruby

The binomial_coefficient algorithm is just a very crude and inefficient recursive formula. Using the Rosetta code gives a result almost instantaneously. But even that is still far from highly optimized.
Just as an example, this implementation is an order of magnitude faster:

def binomial_coefficient(n, k)
  k = Math.min(k, n - k)
  (1..k).product(1.to_big_i) {|i| (n + 1 - i)} // (1..k).product(1.to_big_i)
end

Still not the end of optimization probably.

1 Like
  1. I don’t see how you can make binomial_coefficient(n, k) much faster. The results are immediate.

  2. There still is a problem with the original Ruby equivalent variant:
    (1..60).to_a.combinations(53).size

It could certainly be much faster if we could avoid big integer allocations. Either get it working entirely without bigint, or at least without allocating lots of intermediary values (Add mutating methods to BigInt by Exilor · Pull Request #9825 · crystal-lang/crystal · GitHub).

I’m wondering if we could get away with floating point arithmetics or some other tricks :thinking:

  1. OK. I was thinking more of the algorithm vs number conversions.

  2. What about making combinations method on arrays just as fast too?

I don’t follow what you mean by 2.
Indexable#combinations materializes an array with all combinations. It can’t be fast because it really needs to iterate and collect all combinations. #each_combination is the lazy variant and faster if you only want the size.

1 Like

That most voted answer seems to be incomplete. This one is probably better: c++ - Calculating the Amount of Combinations - Stack Overflow