Locked code in benchmarks

wontruefree · April 18, 2023, 4:12pm

I have some code that seems to work in practice, in testing, but not in benchmarks.

I am trying to understand why. When I put in print or do any type of debugging it appears I will get multiple test runs but code will get stuck before completing enough runs to generate a benchmark. I am not an experienced MT programmer but I am hoping this is a concept I have not learned yet.

Here is an extract of the code that is not working:

require "benchmark"

def locking_code(arr)
  channel = Channel(Int64).new

  spawn do
    acc = 0_i64
    index = 0

    while index < arr.size
      acc += arr[index]
      index += 2
    end

    channel.send(acc)
  end

  spawn do
    acc = 0_i64
    index = 1

    while index < arr.size
      acc += arr[index]
      index += 2
    end

    channel.send(acc)
  end

  channel.receive + channel.receive
end

arr = (1_i64..100_000_i64).to_a

Benchmark.ips do |ips|
  ips.report("locking") { locking_code(arr) }
end

I am running this on an M2 MacBook pro

Darwin jbook.local 22.4.0 Darwin Kernel Version 22.4.0: Mon Mar  6 20:59:58 PST 2023; root:xnu-8796.101.5~3/RELEASE_ARM64_T6020 arm64 arm Darwin

building the binary with

crystal build -Dpreview_mt --release benchmarks/locking_code.cr -o bin/locking

running with

CRYSTAL_WORKERS=8 bin/locking

Also weirdly using Fiber.yield works for me. it has poor performance so I was trying channels. Which is how I got down this path. If anyone can explain Fiber.yield vs channel performance I would be really interested.

rob · April 24, 2023, 12:59pm

I’m not sure I can get behind the idea of benchmarking anything that locks — or anything that could get blocked by an external system at all. Wouldn’t any blocking that occurs render the benchmark pointless?

It would seem to me that you could separate your algorithm from the mutex concerns. Then benchmark the algorithm and tune it for performance in a single thread environment.

zw963 · April 24, 2023, 1:52pm

Following your’s process, it output like this:

 ╰─ $ CRYSTAL_WORKERS=8 ./locking
locking  11.89k ( 84.09µs) (± 7.85%)  432B/op  fastest

It works for me? ooes it make sense?

Crystal 1.8.0 [14bfa992e] (2023-04-14)

LLVM: 15.0.7
Default target: x86_64-pc-linux-gnu

HertzDevil · April 24, 2023, 2:05pm

If it only reproduces on AArch64, it might be some variant of Bug in atomic operations on aarch64 with multi-threading · Issue #13010 · crystal-lang/crystal · GitHub or More places might need memory barriers on AArch64 · Issue #13055 · crystal-lang/crystal · GitHub

wontruefree · April 29, 2023, 6:13am

This is good to know. Honestly with MT I should check multiple architectures.

Topic		Replies	Views
MT and Mutex Help & Support	16	2183	November 1, 2019
Concurrency question (benchmark example) Help & Support	2	386	February 5, 2021
Crystal and parallelism Help & Support	7	314	February 10, 2025
Introducing the "Sync" shard News	2	197	May 17, 2025
Slow performance of Channels in MT mode Help & Support	5	581	April 20, 2020

Locked code in benchmarks

Related topics