Locked code in benchmarks

I have some code that seems to work in practice, in testing, but not in benchmarks.

I am trying to understand why. When I put in print or do any type of debugging it appears I will get multiple test runs but code will get stuck before completing enough runs to generate a benchmark. I am not an experienced MT programmer but I am hoping this is a concept I have not learned yet.

Here is an extract of the code that is not working:

require "benchmark"

def locking_code(arr)
  channel = Channel(Int64).new

  spawn do
    acc = 0_i64
    index = 0

    while index < arr.size
      acc += arr[index]
      index += 2
    end

    channel.send(acc)
  end

  spawn do
    acc = 0_i64
    index = 1

    while index < arr.size
      acc += arr[index]
      index += 2
    end

    channel.send(acc)
  end

  channel.receive + channel.receive
end

arr = (1_i64..100_000_i64).to_a

Benchmark.ips do |ips|
  ips.report("locking") { locking_code(arr) }
end

I am running this on an M2 MacBook pro

Darwin jbook.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 arm64 arm Darwin

building the binary with

crystal build -Dpreview_mt --release benchmarks/locking_code.cr -o bin/locking

running with

CRYSTAL_WORKERS=8 bin/locking

Also weirdly using Fiber.yield works for me. it has poor performance so I was trying channels. Which is how I got down this path. If anyone can explain Fiber.yield vs channel performance I would be really interested.

1 Like

I’m not sure I can get behind the idea of benchmarking anything that locks — or anything that could get blocked by an external system at all. Wouldn’t any blocking that occurs render the benchmark pointless?

It would seem to me that you could separate your algorithm from the mutex concerns. Then benchmark the algorithm and tune it for performance in a single thread environment.

1 Like

Following your’s process, it output like this:

 ╰─ $ CRYSTAL_WORKERS=8 ./locking
locking  11.89k ( 84.09µs) (± 7.85%)  432B/op  fastest

It works for me? ooes it make sense?

Crystal 1.8.0 [14bfa992e] (2023-04-14)

LLVM: 15.0.7
Default target: x86_64-pc-linux-gnu

If it only reproduces on AArch64, it might be some variant of Bug in atomic operations on aarch64 with multi-threading · Issue #13010 · crystal-lang/crystal · GitHub or More places might need memory barriers on AArch64 · Issue #13055 · crystal-lang/crystal · GitHub

1 Like

This is good to know. Honestly with MT I should check multiple architectures.