You can make it use much less memory by only creating as many fibers as there are threads, sending the batches through a Channel(Int64...Int64)
and have each fiber loop to process the next batch. It may even be a bit faster, especially if the channel buffer size is the number of batches.
Notice that by creating as many fibers as batch numbers, you get faster… because you end up creating less fibers. With only a fixed set of fibers, it would be the opposite: small batches (e.g. 1000) are faster than larger ones.
require "wait_group"
def collatz(seed : Int64)
steps = 0_i64
while seed > 1
while seed % 2 == 0
steps &+= 1
seed //= 2
end
if seed > 1
steps &+= 1
seed = seed &* 3 &+ 1
end
end
steps
end
n = ENV.fetch("N", "1_000_000_000").to_i64(underscore: true)
b = ENV.fetch("B", "1_000").to_i64(underscore: true)
w = ENV.fetch("CRYSTAL_WORKERS", "4").to_i
channel = Channel({Int64, Int64}).new((n // b + 1).to_i32)
wg = WaitGroup.new(w)
w.times do |i|
spawn(name: "WORKER-#{i}") do
while r = channel.receive?
(r[0]...r[1]).each do |seed|
steps = collatz(seed)
if seed % 1_000_000 == 0
print "Seed: #{seed} Steps: #{steps}\r"
end
end
end
wg.done
end
end
start = Time.measure do
s = n // b
r0 = 0_i64
s.times do
r1 = r0 &+ b
channel.send({r0, r1})
r0 = r1
end
if n - b &* s > 0
channel.send({r0, n})
end
channel.close
wg.wait
end
puts "\ncollatz took: #{start}"
As for timings, on my old i7 1472HQ, the Go example takes 6m30s, the direct Crystal port 5m30s and the above 1m30s (8 threads).