New Event Loop (UNIX): call for reviews & tests

I think you misunderstood me. I was not arguing against heaps as a whole (and the direction obviously don’t matter) , just the classic text-book implementation with really complex rules where everything is allocated either in a single array or in pairs of 2. There are plenty of variations of heaps that are faster, and you have obviously found some :)

#14996 is ready to merge.

We have summarized the changes in RFC 0009. Comments are appreciated.

Not all of it is already implemented, but the current status of the PR should be good enough to roll it out for testing in master. We have a couple of follow-ups in the pipe to complete the missing pieces.

And we’re preparing an announcement for the website:

5 Likes

The new event loop has been merged into master and is available in nightly builds.

6 Likes

There’s a very rough test here.

It’s show the performance has a big impact when run CPU bound application with mt support enabled if small batches(equals to core number) with a large batch_size is used.


EDIT:

Following is a example:

Assume we have 10000 parallel tasks, run it in a 8 core 16 thread laptop, both solution CRYSTAL_WORKERS was set to 16, and we will run 16 fibers parallel.

solution one: we create 16 buffered channel, each channel size is 625.
solution two: we create 100 buffered channel, each channel size 100.

At least from my test, i saw the performance decrease significant for solution one, but not for solution two.

Perhaps I used it incorrectly?

Okay, i create a new code snippets for test the mt performance when create different size buffered channel, following is the code and result:

code
require "wait_group"

def collatz(seed : Int64)
  steps = 0_i64

  while seed > 1
    while seed % 2 == 0
      steps &+= 1
      seed //= 2
    end

    if seed > 1
      steps &+= 1
      seed = seed &* 3 &+ 1
    end
  end

  steps
end

def calculate(total_seeds, batch_size, worker_size, batches)
  channel = Channel({Int64, Int64}).new(batches + 1)
  wg = WaitGroup.new(worker_size)

  p! batch_size
  p! batches

  worker_size.times do |i|
    spawn(name: "WORKER-#{i}") do
      while r = channel.receive?
        (r[0]...r[1]).each do |seed|
          steps = collatz(seed)

          if seed % 1_000_000 == 0
            print "Seed: #{seed} Steps: #{steps}\r"
          end
        end
      end
      wg.done
    end
  end

  start = Time.measure do
    r0 = 0_i64

    batches.times do
      r1 = r0 &+ batch_size
      channel.send({r0, r1})
      r0 = r1
    end

    if total_seeds - batch_size &* batches > 0
      channel.send({r0, total_seeds})
    end

    channel.close
    wg.wait
  end

  puts "\ncollatz took: #{start}"
end

# ---------------- common part ----------------

total_seeds = 1_000_000_000_i64
worker_size = ENV.fetch("CRYSTAL_WORKERS").to_i

p! total_seeds
p! worker_size

puts "-"*100

batch_size = 1000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 2000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 5000_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 500_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batch_size = 100_i64
batches = (total_seeds // batch_size).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

puts "-"*100

batches = worker_size
batch_size = (total_seeds // batches).to_i32
calculate(total_seeds, batch_size, worker_size, batches)

Following is the result when run on Crystal 1.13.2

total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.577703068
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.815965558
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.833879590
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:12.944712438
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.189403820
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 937000000 Steps: 224
collatz took: 00:00:14.090417122

Following is the result when run on Crystal 1.14.0

 ╰──➤ $ ./1
total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.265471911
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.537970427
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.500822759
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.484737884
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.781218646
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 999000000 Steps: 162
collatz took: 00:00:13.329298756

Following is the result when run on the latest master 879ec1247

total_seeds # => 1000000000
worker_size # => 16
----------------------------------------------------------------------------------------------------
batch_size # => 1000
batches # => 1000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.540662927
----------------------------------------------------------------------------------------------------
batch_size # => 2000
batches # => 500000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.590167441
----------------------------------------------------------------------------------------------------
batch_size # => 5000
batches # => 200000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.690674863
----------------------------------------------------------------------------------------------------
batch_size # => 500
batches # => 2000000
Seed: 999000000 Steps: 162
collatz took: 00:00:13.750579132
----------------------------------------------------------------------------------------------------
batch_size # => 100
batches # => 10000000
Seed: 999000000 Steps: 162
collatz took: 00:00:14.017306410
----------------------------------------------------------------------------------------------------
batch_size # => 62500000
batches # => 16
Seed: 687000000 Steps: 239
collatz took: 00:00:22.480855790

The performance has dropped significantly when create a big size buffered channel(check the last part) if run with latest master.