The Crystal Programming Language Forum

Segfault on Linux

have an annoying issue occurring on Linux but working fine on MacOS

Invalid memory access (signal 11) at address 0x0
[0x561a7c4a6b36] *Exception::CallStack::print_backtrace:(Int32 | Nil) +118
[0x561a7c49397c] __crystal_sigfault_handler +316
[0x7efcceea8980] ???
[0x0] ???

That’s using the latest ubuntu docker image
I’ve isolated the issue to this spec:

I would check if setting CRYSTAL_LOAD_DWARF=1 or CRYSTAL_LOAD_DWARF=0 leads to a different output. That would narrow/discard if the problem itself is on the dwarf loading which is different on linux and darwin.

1 Like

@bcardiff the dwarf loading made no difference

I used GDB and tracked the segfault to this line:

Stepping through on GDB I see:

1 Like

Grabbed this just before the seg fault

  1. break point set for the break on line 149
  2. called step and it jumped into the else statement - but it should have exited the loop

A screen shot of state just after the second step statement

Although when I overloaded the reschedule function with an alternative implementation I’m still seeing the crash.

What’s also weird is that the crash doesn’t consistently happen either… Crashes 2 out of 3 runs - just to make life difficult

I solved it by refactoring away from using Fiber scheduling to using channels
as I’d built the library before select timeout was available


Good! You mean this change right? Getting rid of those sleep is :rocket:

Maybe the segfault was coming from rescheduling a fiber in an invalid state, so it’s definitely better to use Channel for synchronization.

On that note, maybe you want the cancel channel to have a capacity of 1 (instead of 0) so the #cancel method will never block. I think right now it could in some race condition. The cancel.send will block until a receiver gets the message, but that is not what you want here.


Is this multi-threaded? Any simplified code to repro it?

not multi-threaded and no simplified code…
it was pretty hacky fiber manipulation - waking fibers from sleep early, which might have left some dangling pointers somewhere?

This was the fiber timer


  • sleep for the timer period and then perform an action if not cancelled
  • cancelling the timer involved flagging that it was cancelled and then waking it up early to clean up memory

I’m not sure and I could be totally off, but would a mutex lock around @cancelled help?

Shouldn’t be required as running on the one thread, so most likely unrelated.
The issue only occurred on Linux and not every run, so probably going to be a tough one to track down

Hard to diagnose without a simplified example. Maybe valgrind help? Good luck!

All good, I refactored around it.