# Segmented Sieve Memory Usage

I’m messing around with implementing and comparing different prime number generators, including prime sieves, after becoming interested in @jzakiya’s Sieve of Zakiya. I haven’t gotten as far as complex sieves like that yet, though. I’m currently working on implementing varieties of the Sieve of Eratosthenes, and I’m stumped by my Segmented Sieve of Eratosthenes using much more memory than my standard Sieve of Eratosthenes implementation.

Here are my implementations:

``````def sieve_of_eratosthenes(upto n)
possible_primes = BitArray.new n + 1, initial: true
possible_primes = false
possible_primes = false

(2..).take_while { |i| i <= Math.sqrt(n) }.each do |i|
if possible_primes[i]?
(i**2).upto(n).step(i).each do |j|
possible_primes[j] = false
end
end
end

possible_primes.each_index.select { |index| possible_primes[index] }.to_a
end

def segmented_sieve_of_eratosthenes(upto n)
delta = Math.sqrt(n).to_i

known_primes = sieve_of_eratosthenes upto: delta

possible_primes = BitArray.new delta
(delta + 1).upto(n).step(delta).each do |segment_min|
segment_max = segment_min + delta - 1
prime_divisor_limit = Math.sqrt(segment_max)
relevant_prime_divisors = known_primes.each.take_while { |prime| prime <= prime_divisor_limit }
possible_primes.fill true

pre_segment_min = segment_min - 1
relevant_prime_divisors.each do |p|
starting_index = p - (pre_segment_min % p) - 1
starting_index.upto(delta - 1).step(p).each do |j|
possible_primes[j] = false
end
end

possible_primes.each_index do |index|
if possible_primes[index]
known_primes << (segment_min + index)
end
end
end

known_primes
end
``````

The segmented sieve is using around 5 times as much memory as the basic version*. I initially noticed a difference in memory consumption when using `Benchmark.ips`, and since then I’ve just been checking with `GC.stats.total_bytes` at the end of a run (switching implementations and recompiling to compare).

I’m not sure whether adding `GC.stats.total_bytes` checks around the code is a valid way to check what’s causing memory allocation, but when I did that it seemed like the `if possible_primes[index]` line toward the bottom of the segmented sieve implementation was allocating a bunch of memory, which seems… wrong.

So basically I’m wondering if anyone else has an idea of why I’m seeing more memory usage in the segmented sieve than in the basic one. If you have a critique of the algorithm re runtime performance or correctness, I guess that’s fine, but it’s not really what I’m looking for.

If it matters, I’m running Crystal 1.2.2 with LLVM 10.0.0 on OpenSUSE Tumbleweed.

* 36.7MiB vs 7.27MiB for n = 10,000,000, 94.9kiB vs 26.5kiB for n = 10,000

Seems like a reasonable way to test. Maybe you can check total_bytes within the code to see where it spikes?

Yeah, here’s the code with `#total_bytes` checks throughout and the code that creates that output: gist. The non-segmented version takes only 10.1 KiB in this case, while the segmented version takes 146 KiB. I did intentionally choose a small `n` in order to lessen the amount of output to sift through, but the results have been similar for larger `n`.

EDIT: I goofed with my `#total_bytes` output and was creating a bunch of strings which were taking memory. I’ve updated the gist to reflect a better way of outputting that value without ruining the usefulness of it. The difference in memory consumption is not nearly so much now, but the segmented version is still using more memory: 11.7 KiB vs the aforementioned 10.1 KiB for the non-segmented version. The values I gave in the original post for larger `n` are still valid.

I’m not quite sure how to debug it. Maybe add similar debugs to the original and look for differences? Does GC.collect help?