There is a way to optimize this program?

Test my runnable code use latest Crystal night build.

 ╰──➤ $ cr version
Crystal 1.15.0-dev [d57c9d84d] (2024-11-20)

LLVM: 18.1.8
Default target: x86_64-pc-linux-gnu

It’s seem like original ysbaddaden version no big different, but, my changed version (small batches which equals to core number, with a large batch_size), slow a lot than before. time took: from 13s → 21s


Following is the result when I change the run order of two version.

 ╰──➤ $ 1  CRYSTAL_WORKERS=16 ./1
(@zw963 changed version)small batches, with a large batch_size, result is unpredictable, and slower(most of time).
total_seeds # => 1000000000
batch_size # => 62500000
worker_size # => 16
batches # => 16
Seed: 690000000 Steps: 195
Seed: 815000000 Steps: 110
Seed: 941000000 Steps: 211
Seed: 316000000 Steps: 199
Seed: 879000000 Steps: 136
Seed: 4000000 Steps: 15469
Seed: 191000000 Steps: 170
Seed: 812000000 Steps: 795
collatz took: 00:00:21.662288613
----------------------------------------------------------------------------------------------------
(@ysbaddaden original version)large batches, with a small batch_size, result correct, and faster.
total_seeds # => 1000000000
batch_size # => 1000
worker_size # => 16
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.697945316
 ╰──➤ $ CRYSTAL_WORKERS=16 ./1
(@ysbaddaden original version)large batches, with a small batch_size, result correct, and faster.
total_seeds # => 1000000000
batch_size # => 1000
worker_size # => 16
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.638324348
----------------------------------------------------------------------------------------------------
(@zw963 changed version)small batches, with a large batch_size, result is unpredictable, and slower(most of time).
total_seeds # => 1000000000
batch_size # => 62500000
worker_size # => 16
batches # => 16
Seed: 937000000 Steps: 224
collatz took: 00:00:21.383333880

I want to prove you’re right, tried many times, but, got a completely opposite conclusion:

if the channel buffer size is the number of batches, the performance is the worst.

You’re an expert in this area, maybe I misunderstood your meaning of batches?
or, you said is wrong?

Following is the example code, The worst performing is the last, you can run it like CRYSTAL_WORKERS=16 crystal run --release --Dpreview_mt 1.cr

require "wait_group"

def collatz(seed : Int64)
  steps = 0_i64

  while seed > 1
    while seed % 2 == 0
      steps &+= 1
      seed //= 2
    end

    if seed > 1
      steps &+= 1
      seed = seed &* 3 &+ 1
    end
  end

  steps
end

def calculate(total_seeds, batch_size, worker_size, batches)
  channel = Channel({Int64, Int64}).new(batches + 1)
  wg = WaitGroup.new(worker_size)

  p! batch_size
  p! batches

  worker_size.times do |i|
    spawn(name: "WORKER-#{i}") do
      while (r = channel.receive?)
        (r[0]...r[1]).each do |seed|
          steps = collatz(seed)

          if seed % 1_000_000 == 0
            print "Seed: #{seed} Steps: #{steps}\r"
          end
        end
      end
      wg.done
    end
  end

  start = Time.measure do
    r0 = 0_i64

    batches.times do
      r1 = r0 &+ batch_size
      channel.send({r0, r1})
      r0 = r1
    end

    if total_seeds - batch_size &* batches > 0
      channel.send({r0, total_seeds})
    end

    channel.close
    wg.wait
  end

  puts "\ncollatz took: #{start}"
end

# ---------------- common part ----------------

total_seeds = 1_000_000_000_i64
worker_size = ENV.fetch("CRYSTAL_WORKERS").to_i

p! total_seeds
p! worker_size

puts "-"*100

batch_size = 1000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 2000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 5000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 500_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 100_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batches = worker_size
batch_size = (total_seeds // batches).to_i32
calculate(total_seeds, batch_size, worker_size, batches)