I think you misunderstood me. I was not arguing against heaps as a whole (and the direction obviously don’t matter) , just the classic text-book implementation with really complex rules where everything is allocated either in a single array or in pairs of 2. There are plenty of variations of heaps that are faster, and you have obviously found some :)
#14996 is ready to merge.
We have summarized the changes in RFC 0009. Comments are appreciated.
Not all of it is already implemented, but the current status of the PR should be good enough to roll it out for testing in master. We have a couple of follow-ups in the pipe to complete the missing pieces.
And we’re preparing an announcement for the website:
The new event loop has been merged into master and is available in nightly builds.
There’s a very rough test here.
It’s show the performance has a big impact when run CPU bound application with mt support enabled if small batches(equals to core number) with a large batch_size is used.
EDIT:
Following is a example:
Assume we have 10000 parallel tasks, run it in a 8 core 16 thread laptop, both solution CRYSTAL_WORKERS was set to 16, and we will run 16 fibers parallel.
solution one: we create 16 buffered channel, each channel size is 625.
solution two: we create 100 buffered channel, each channel size 100.
At least from my test, i saw the performance decrease significant for solution one, but not for solution two.
Perhaps I used it incorrectly?
Okay, i create a new code snippets for test the mt performance when create different size buffered channel, following is the code and result:
code
require "wait_group"
def collatz(seed : Int64)
steps = 0_i64
while seed > 1
while seed % 2 == 0
steps &+= 1
seed //= 2
end
if seed > 1
steps &+= 1
seed = seed &* 3 &+ 1
end
end
steps
end
def calculate(total_seeds, batch_size, worker_size, batches)
channel = Channel({Int64, Int64}).new(batches + 1)
wg = WaitGroup.new(worker_size)
p! batch_size
p! batches
worker_size.times do |i|
spawn(name: "WORKER-#{i}") do
while r = channel.receive?
(r[0]...r[1]).each do |seed|
steps = collatz(seed)
if seed % 1_000_000 == 0
print "Seed: #{seed} Steps: #{steps}\r"
end
end
end
wg.done
end
end
start = Time.measure do
r0 = 0_i64
batches.times do
r1 = r0 &+ batch_size
channel.send({r0, r1})
r0 = r1
end
if total_seeds - batch_size &* batches > 0
channel.send({r0, total_seeds})
end
channel.close
wg.wait
end
puts "\ncollatz took: #{start}"
end
# ---------------- common part ----------------
total_seeds = 1_000_000_000_i64
worker_size = ENV.fetch("CRYSTAL_WORKERS").to_i
p! total_seeds
p! worker_size
puts "-"*100
batch_size = 1000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
puts "-"*100
batch_size = 2000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
puts "-"*100
batch_size = 5000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
puts "-"*100
batch_size = 500_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
puts "-"*100
batch_size = 100_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
puts "-"*100
batches = worker_size
batch_size = (total_seeds // batches).to_i32
calculate(total_seeds, batch_size, worker_size, batches)
Following is the result when run on Crystal 1.13.2
total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.577703068
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.815965558
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.833879590
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.944712438
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.189403820
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 937000000 Steps: 224
collatz took: 00:00:14.090417122
Following is the result when run on Crystal 1.14.0
╰──➤ $ ./1
total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.265471911
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.537970427
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.500822759
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.484737884
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.781218646
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 999000000 Steps: 162
collatz took: 00:00:13.329298756
Following is the result when run on the latest master 879ec1247
total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.540662927
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.590167441
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.690674863
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.750579132
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:14.017306410
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 687000000 Steps: 239
collatz took: 00:00:22.480855790
The performance has dropped significantly when create a big size buffered channel(check the last part) if run with latest master.