Crystal spec hangs on Apple Silicon after linking

Hi everyone,

I’m experiencing an issue where crystal spec completes Parse/Semantic and all Codegen stages (bc+obj/linking/dsymutil), but then completely hangs and doesn’t execute the specs.

Here is the output with -p -s

$ crystal spec
^C

Aborted!
Finished in 5:40 minutes

$ crystal spec -p -s
Parse:                             00:00:00.000543875 (   1.27MB)
Semantic (top level):              00:00:00.305503792 ( 199.34MB)
Semantic (new):                    00:00:00.001553541 ( 199.34MB)
Semantic (type declarations):      00:00:00.020784875 ( 199.34MB)
Semantic (abstract def check):     00:00:00.006858708 ( 215.34MB)
Semantic (restrictions augmenter): 00:00:00.006153750 ( 231.34MB)
Semantic (ivars initializers):     00:00:00.008899000 ( 231.34MB)
Semantic (cvars initializers):     00:00:00.087336542 ( 263.39MB)
Semantic (main):                   00:00:00.894053917 ( 712.27MB)
Semantic (cleanup):                00:00:00.000304000 ( 712.27MB)
Semantic (recursive struct check): 00:00:00.000601084 ( 712.27MB)
Codegen (crystal):                 00:00:00.737221750 ( 777.27MB)
Codegen (bc+obj):                  00:00:00.511846916 ( 811.27MB)
Codegen (linking):                 00:00:00.206456875 ( 811.27MB)
dsymutil:                          00:00:00.209412083 ( 811.27MB)

Codegen (bc+obj):
 - all previous .o files were reused
^C

Aborted!
Finished in 17.36 seconds
0 examples, 0 failures, 0 errors, 0 pending

Key Observations:

  • Individual specs run fine when executed separately.

  • The full suite passes cleanly in our Linux CI ( Update AGENTS.md · owasp-noir/noir@130a746 · GitHub ).

  • This issue consistently happens on macOS / Apple Silicon.

  • Clearing the cache (rm -rf ~/.cache/crystal) does not fix the issue; it still hangs right after the Codegen/linking stage.

  • CPU Usage: While hanging, both the crystal spec process and the compiled crystal-run-spec.tmp binary sit at 0.0% CPU (State S+). I think.. this indicates a deadlock or a blocking wait rather than an infinite loop.

  • You can reproduce this by cloning the OWASP Noir repository and running crystal spec.

Environment:

  • macOS: Tahoe 26.3

  • Crystal: 1.19.1

  • Architecture: Apple Silicon (ARM64)

I suspect this might be a macOS-specific LLVM or spec runner issue, possibly a deadlock when the runner tries to execute the compiled spec binary. Has anyone seen this behavior before, or have any pointers on how to debug this further?

Thanks!

The trailer indicates that the spec binary actually executes, but it seems to hang for some reason.

You can pass the --verbose option to see whether the runner actually starts running any example.

Otherwise, you could try connecting a debugger to see where the executable is stuck.

1 Like

From the Activity monitor you can sample the process while is hanging. That could give you enough information in some cases: is a thread blocked? Why?

1 Like

I checked out a previous commit from the commit log and ran the specs and they passed, which indicates something in the code.

I used git bisect to find the offending commit based on which ones will run the specs and it points to this commit. I can’t find anything in that commit that would cause this to happen but, sure enough, the specs run just fine at the parent commit.

1 Like

Even with the --verbose flag added, the process halts at the same stage. It seems debugging is ultimately necessary!

$ crystal spec -p -s --verbose
Parse:                             00:00:00.000138416 (   1.17MB)
Semantic (top level):              00:00:00.506577167 ( 195.11MB)
Semantic (new):                    00:00:00.001646625 ( 195.11MB)
Semantic (type declarations):      00:00:00.020967084 ( 211.11MB)
Semantic (abstract def check):     00:00:00.007668250 ( 227.11MB)
Semantic (restrictions augmenter): 00:00:00.006614125 ( 227.11MB)
Semantic (ivars initializers):     00:00:00.008941000 ( 243.11MB)
Semantic (cvars initializers):     00:00:00.088758250 ( 259.16MB)
Semantic (main):                   00:00:00.914010000 ( 692.02MB)
Semantic (cleanup):                00:00:00.000309042 ( 692.02MB)
Semantic (recursive struct check): 00:00:00.000605375 ( 692.02MB)
Codegen (crystal):                 00:00:01.007679167 ( 775.02MB)
Codegen (bc+obj):                  00:00:00.359948959 ( 807.02MB)
Codegen (linking):                 00:00:00.487133375 ( 807.02MB)
dsymutil:                          00:00:00.541338334 ( 807.02MB)

Codegen (bc+obj):
 - all previous .o files were reused

Thanks for the suggestion! I’ll sample the process in Activity Monitor while it’s hanging and check the backtraces.

Thanks for running the bisect!

I also noticed the issue started exactly at that commit. I focused on the newly added specs for the hang analysis, but I haven’t pinpointed the exact cause yet.

I’ll dig deeper over the weekend and update here.

From the sound of it the specs start but the program gets stuck waiting on the event loop: 0% CPU, the SIGINT signal is received and handled, meaning that the program isn’t blocked on a syscall, but for some reason a fiber isn’t being resumed.

You could build an executable with:

$ crystal build -o owasp_spec specs/*_spec.cr spec/**/*_spec.cr

You can then run lldb (or gdb or another debugger) to run owasp_spec, interrupt the program and see where it hangs… but it probably won’t tell much: stuck on kevent (or epoll_wait) which is “waiting on the evloop”.

Tracing might be more interesting:

$ export CRYSTAL_TRACE=evloop,sched
$ export CRYSTAL_TRACE_FILE=trace.log
$ crystal spec -Dtracing

The trace.log file will report everything that happened related to the fiber scheduler and the event loop while the program executes.

Other ideas:

  • Does it reproduce in a Linux VM? It might an issue on AArch64, rather than macOS.
  • Does it reproduce with -Dpreview_mt -Dexecution_context to enable execution contexts?
  • with execution contexts, perf-tools allows to print scheduler/fiber details (and there’s pending work that could print the backtrace of non running fibers).
1 Like

I followed @ysbaddaden’s tracing suggestion and did some more bisection digging in the noir project. Here’s what turned up.

Environment
Crystal 1.19.1 (2026-01-20)
LLVM: 22.1.0
Target: aarch64-apple-darwin25.2.0
macOS 26.2 (Apple Silicon)

Step by step what I found

  1. crystal spec spec/unit_test/ spec/functional_test/ → hangs immediately (0 examples)
  2. Narrowed it down — only unit_test/llm/ + functional_test/ triggers it. Other unit_test subdirs are fine.
  3. The llm specs pull in ../src/llm/adapterollama.cr → crest → http_proxy → http/proxy/server.crrequire "wait_group"
  4. Minimal repro with just two files
# spec/_repro_spec.cr
require "spec"
require "wait_group"
describe("wg") { it("ok") { true.should be_true } }
crystal spec spec/_repro_spec.cr spec/functional_test/testers/javascript/express_false_positives_client_spec.cr

→ hangs right away, 0 examples, 0% CPU.

The functional tests create a NoirRunner that runs tech analyzers concurrently using WaitGroup.wait (analyzer.cr:125). WaitGroup internally uses Fiber.suspend, Crystal::SpinLock, and Atomic, and on Apple Silicon’s kqueue-based event loop it seems the fibers just don’t resume properly when combined.

Ran the trace

CRYSTAL_TRACE=evloop,sched CRYSTAL_TRACE_FILE=trace.log \
  crystal spec spec/_repro_spec.cr spec/functional_test/testers/javascript/express_false_positives_client_spec.cr -Dtracing

Main fiber goes straight into evloop.run (blocking=1) and stays there — stuck on kevent for ~4.8s+.
Only the stack-pool-collector wakes up occasionally; the spec runner fiber never gets scheduled again.
Trace also showed around 1052 fibers in total when it hangs (way more than when running the dirs separately).

My guess is there’s some interaction between the WaitGroup from http_proxy and the one used in the functional tests, only happens when they get compiled together on Apple Silicon (doesn’t reproduce on Linux, probably epoll vs kqueue difference).

Anyway, that’s what I found so far. Let me know what you think… happy to run more tests if it helps!

Yes, you can’t create a ::WaitGroup type because there’s already one stdlib (for quite some versions now). If it happens to get loaded, then either implemetation is monkey patching the other one, and they conflict. Since it’s a synchronization type, a fiber is almost sure to never resume at some point.

At the very least, the custom implementation should be namespaced as Noir::WaitGroup for example, or the global WaitGroup loaded for older Crystal versions.

What happens if you remove your custom implementation and just use the one from stdlib? It should be a bit more optimized anyway.

3 Likes

Thank you! Your assistance helped me resolve the matter successfully!!

1 Like