In our CI system, we’ve had a handful of builds die with a segfault (SIGSEGV) while running specs, in particular a section of our specs that use Timecop.travel
. So far I can not reproduce this locally, and can’t reproduce it in CI either – it’s just caused a few CI builds to fail and then they succeed when the build is retried. It’s happened in both 1.10.1 and 1.9.2. Any ideas on how could I go about debugging this?
Here’s the relevant output from CI logs:
Invalid memory access (signal 11) at address 0x4
[0x562c5a2ac9c6] *Exception::CallStack::print_backtrace:Nil +118 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a2349d6] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +310 in /app/.cache/crystal/crystal-run-spec.tmp
[0x7f22cafc1fd0] ?? +139787411136464 in /lib/x86_64-linux-gnu/libc.so.6
[0x562c5a41f951] *Timecop::TimeStackItem#monotonic:Time::Span +17 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a3e16a3] *Time::monotonic:Time::Span +67 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a7221b1] *IdleGC::collect:Nil +33 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a289be5] ~procProc(Nil) +389 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a36bf52] *Fiber#run:(IO::FileDescriptor | Nil) +114 in /app/.cache/crystal/crystal-run-spec.tmp
[0x562c5a2344f6] ~proc2Proc(Fiber, (IO::FileDescriptor | Nil)) +6 in /app/.cache/crystal/crystal-run-spec.tmp
[0x0] ???
Command exited with non-zero status 11
Command being timed: "crystal spec --verbose --stats -Dpreview_mt"
I’ve looked at:
- Time.monotonic
- Crystal::System::Time.monotonic – which is where the actual
LibC.clock_gettime(LIBC::CLOCK_MONOTONIC, ...)
syscall happens - Timecop::TimeStackItem#monotonic
But don’t see any obvious issues. Per the libc docs on clock_gettime
, “This function cannot fail.”
I’ll note that the IdleGC.collect
call in the stacktrace means this may be happening in a different Fiber
than the Timecop.travel
call. But still, a segfault seems really surprising.
Apologies, I realize it’s not too useful since I can’t reproduce this, but wondering what could potentially cause a segfault. Thanks for your help.