Segmented Sieve Memory Usage

I’m messing around with implementing and comparing different prime number generators, including prime sieves, after becoming interested in @jzakiya’s Sieve of Zakiya. I haven’t gotten as far as complex sieves like that yet, though. I’m currently working on implementing varieties of the Sieve of Eratosthenes, and I’m stumped by my Segmented Sieve of Eratosthenes using much more memory than my standard Sieve of Eratosthenes implementation.

Here are my implementations:

def sieve_of_eratosthenes(upto n)
  possible_primes = BitArray.new n + 1, initial: true
  possible_primes[0] = false
  possible_primes[1] = false

  (2..).take_while { |i| i <= Math.sqrt(n) }.each do |i|
    if possible_primes[i]?
      (i**2).upto(n).step(i).each do |j|
        possible_primes[j] = false
      end
    end
  end

  possible_primes.each_index.select { |index| possible_primes[index] }.to_a
end

def segmented_sieve_of_eratosthenes(upto n)
  delta = Math.sqrt(n).to_i

  known_primes = sieve_of_eratosthenes upto: delta

  possible_primes = BitArray.new delta
  (delta + 1).upto(n).step(delta).each do |segment_min|
    segment_max = segment_min + delta - 1
    prime_divisor_limit = Math.sqrt(segment_max)
    relevant_prime_divisors = known_primes.each.take_while { |prime| prime <= prime_divisor_limit }
    possible_primes.fill true

    pre_segment_min = segment_min - 1
    relevant_prime_divisors.each do |p|
      starting_index = p - (pre_segment_min % p) - 1
      starting_index.upto(delta - 1).step(p).each do |j|
        possible_primes[j] = false
      end
    end

    possible_primes.each_index do |index|
      if possible_primes[index]
        known_primes << (segment_min + index)
      end
    end
  end

  known_primes
end

The segmented sieve is using around 5 times as much memory as the basic version*. I initially noticed a difference in memory consumption when using Benchmark.ips, and since then I’ve just been checking with GC.stats.total_bytes at the end of a run (switching implementations and recompiling to compare).

I’m not sure whether adding GC.stats.total_bytes checks around the code is a valid way to check what’s causing memory allocation, but when I did that it seemed like the if possible_primes[index] line toward the bottom of the segmented sieve implementation was allocating a bunch of memory, which seems… wrong.

So basically I’m wondering if anyone else has an idea of why I’m seeing more memory usage in the segmented sieve than in the basic one. If you have a critique of the algorithm re runtime performance or correctness, I guess that’s fine, but it’s not really what I’m looking for.

If it matters, I’m running Crystal 1.2.2 with LLVM 10.0.0 on OpenSUSE Tumbleweed.

* 36.7MiB vs 7.27MiB for n = 10,000,000, 94.9kiB vs 26.5kiB for n = 10,000

Seems like a reasonable way to test. Maybe you can check total_bytes within the code to see where it spikes?

Yeah, here’s the code with #total_bytes checks throughout and the code that creates that output: gist. The non-segmented version takes only 10.1 KiB in this case, while the segmented version takes 146 KiB. I did intentionally choose a small n in order to lessen the amount of output to sift through, but the results have been similar for larger n.

EDIT: I goofed with my #total_bytes output and was creating a bunch of strings which were taking memory. I’ve updated the gist to reflect a better way of outputting that value without ruining the usefulness of it. The difference in memory consumption is not nearly so much now, but the segmented version is still using more memory: 11.7 KiB vs the aforementioned 10.1 KiB for the non-segmented version. The values I gave in the original post for larger n are still valid.

I’m not quite sure how to debug it. Maybe add similar debugs to the original and look for differences? Does GC.collect help?