I’m wondering if memory isn’t your problem.
Your program spawns an unbounded number of fibers which requires to allocate a bunch of small objects into memory + as many fiber stacks that are a minimum of the OS page size (4KB on Linux). This can be replaced with a fixed number of fibers, for instance the number of CPU and use a Channel to push the work.
You also don’t need i
nor the lastwins
and cnts
arrays: you can just safely compute as you receive the values from the done
fiber.
Last but not least, the concurrent atomic add + print
from each fiber is killing the performance. Reporting after done.receive
allows to skip the atomic and may avoid OS locking stdout (writing to stdout below PIPE_BUF is guaranteed to be atomic on posix targets, hence concurrent writes are blocking each others).
All these changes reduce the execution time from over a minute down to 25s for 100_000_000_000
on my laptop with 8 crystal workers (== System.cpu_count
).
237c237,238
< puts "threads = #{System.cpu_count}"
---
> nthreads = System.cpu_count.to_i32
> puts "threads = #{nthreads}"
256,258c257,259
< cnts = Array(UInt64).new(pairscnt, 0) # number of twinprimes found per thread
< lastwins = Array(UInt64).new(pairscnt, 0) # largest twinprime val for each thread
< done = Channel(Nil).new(pairscnt)
---
> # start a fixed number of fibers to avoid bloating memory
> chan = Channel(Int32).new(nthreads * 32)
> done = Channel({UInt64, UInt64}).new(nthreads) # NOTE: increase buffer size => faster but reports every once in a while
260,261c261
< threadscnt = Atomic.new(0) # count of finished threads
< restwins.each_with_index do |r_hi, i| # sieve twinpair restracks
---
> nthreads.times do
263,268c263,282
< lastwins[i], cnts[i] = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs)
< print "\r#{threadscnt.add(1)} of #{pairscnt} twinpairs done"
< done.send(nil)
< end end
< pairscnt.times { done.receive } # wait for all threads to finish
< print "\r#{pairscnt} of #{pairscnt} twinpairs done"
---
> while r_hi = chan.receive? # sieve twinpair restracks
> twin, cnt = twins_sieve(r_hi, kmin, kmax, ks, start_num, end_num, modpg, primes, resinvrs)
> done.send({twin, cnt})
> end
> end
> end
>
> # push work from another channel to avoid blocking main fiber
> spawn do
> restwins.each { |r_hi| chan.send(r_hi) }
> end
>
> threadscnt = 0
> last_twin = 0_u64
> pairscnt.times do
> twin, cnt = done.receive
> last_twin = twin if twin > last_twin # find largest hi_tp twinprime in range
> twinscnt += cnt # compute number of twinprimes in range
> print "\r#{threadscnt += 1} of #{pairscnt} twinpairs done"
> end
270,271d283
< last_twin = lastwins.max # find largest hi_tp twinprime in range
< twinscnt += cnts.sum # compute number of twinprimes in range
283a296
>