Finding where a fiber is halt

I was needing to find a reason why a given fiber was hanged. This situation can happen when waiting for IO or Channel operation that seems it will never arrive.

To solve this I, temporally, added the call-stack of the fiber before yielding, so given a Fiber instance I will be able to print where it has stopped.

It was great to have that information so explicitly. Here is the patch I used and a small example.

The example spawn a fiber that need two receive actions two finish. After sending one, the main fiber asks where the spawned fiber f is hanged. The backtrace shows exactly it is on the second receive.

# file: foo.cr
require "./fiber-debug"

def foo(ch)
  bar(ch)
end

def bar(ch)
  ch.receive
  ch.receive
end

ch = Channel(Nil).new

f = spawn(name: "worker") do
  foo(ch)
end

ch.send nil

f.print_backtrace
# Fiber's backtrace (name: worker):
# fiber-debug.cr:17:27 in 'resume'
# crystal-0.36.1/src/crystal/scheduler.cr:48:5 in 'resume'
# crystal-0.36.1/src/fiber.cr:197:5 in 'resume'
# crystal-0.36.1/src/crystal/scheduler.cr:149:9 in 'reschedule'
# crystal-0.36.1/src/crystal/scheduler.cr:44:5 in 'reschedule'
# crystal-0.36.1/src/channel.cr:311:7 in 'receive'
# foo.cr:9:3 in 'bar'    <-- The second ch.receive 🎉🎉🎉
# foo.cr:4:3 in 'foo'
# foo.cr:15:3 in '->'
# crystal-0.36.1/src/primitives.cr:255:3 in 'run'
# crystal-0.36.1/src/fiber.cr:92:34 in '->'

ch.send nil

This is the temporal extension I used.

# file: fiber-debug.cr
class Fiber
  property call_stack : Exception::CallStack?

  def print_backtrace
    puts "Fiber's backtrace (name: #{name}):"
    if call_stack = @call_stack
      call_stack.printable_backtrace.each { |l| puts l }
    else
      puts "(running, unable to get backtrace)"
    end
    puts
  end
end

class Crystal::Scheduler
  protected def resume(fiber : Fiber) : Nil
    @current.call_stack = Exception::CallStack.new
    previous_def
    @current.call_stack = nil
  end
end
6 Likes

Thinking in that direction, it would already help to have only the call location for where the execution was yielded (i.e. foo.cr:9:3), right? That could even be resolved from magic constants. Thus it would be very lightweight and could perhaps be added to stdlib, maybe behind a debug flag.

That information in this case would’ve been enough, yes. But, the app could be larger and the whole call-stack might be useful. And discarding all the std-lib part of the call-stack is a bit harder, all methods need to pass the last app callee information.

Yeah, I suppose limited usefulness in complex application probably doesn’t make it worth the effort.

But perhaps we could consider adding your extension behind a flag. I’d see that as a useful generic debugging tool.

Great idea. I’ve wanted something similar, it might also be useful for profiling, possibly if all fibers could be enumerated.

ability to retrieve callstacks of all fibers · Issue #4683 · crystal-lang/crystal · GitHub

1 Like

It works like a charm! Thank you for the code.

I tried also to use fiber_trace.cr, but I think it ignores main fiber.

Small snippet to print fibers with fiber_debug.cr on Ctrl+\:

Signal::QUIT.trap do
  Fiber.unsafe_each {|f| puts "#{f.name} -> #{f.running?} #{f.dead?}"; f.print_backtrace if !f.running? && !f.dead? }
end
1 Like