Glib main loop and Fibers integration

Hi, I was playing with GLib main loop to try to let it integrate well with Crystal Fibers, so people could write GTK aplpications using crystal fibers.

So far, so good, I’m basically using the approach that I knew about in some @asterite comment in this forum, i.e. create a thread; call the blocking function; Fiber.yield; return.

If I compile my example to a executable it works, however if I try to wrap it in a test case it hangs and never returns unless -Dpreview_mt.

For the sake of simplicity I removed any GLib binding code and called the C functions directly.

require "spec"

@[Link("glib-2.0", pkg_config: "glib-2.0")]
lib LibGLib
  fun g_main_context_new : Pointer(Void)
  fun g_main_loop_new(context : Pointer(Void), is_running : LibC::Int) : Pointer(Void)
  fun g_main_loop_run(this : Void*) : Void
  fun g_main_loop_quit(this : Void*) : Void
  fun g_main_loop_is_running(this : Void*) : LibC::Int
end

def main_loop_run(loop : Pointer(Void))
  channel = Channel(Nil).new
  # Need to use thread.new to be sure it will have its own thread and not
  # share it with other fibers.
  Thread.new do
    puts "main loop run on thread #{Fiber.current.name}!"
    LibGLib.g_main_loop_run(loop)
    channel.send(nil)
    puts "main loop thread finished"
    Fiber.yield
  end
  puts "waiting for channel msg from main loop thread"
  channel.receive
  puts "Got the msg!"
end

describe "Fiber integration with GLib main loop" do
  it "works" do
    ctx = LibGLib.g_main_context_new
    loop = LibGLib.g_main_loop_new(ctx, 0)

    spawn(name: "quit") do
      while LibGLib.g_main_loop_is_running(loop).zero?
        puts "waiting main loop to start..."
        Fiber.yield
      end
      puts "calling 'g_main_loop_quit' from thread #{Fiber.current.name}!"
      LibGLib.g_main_loop_quit(loop)
      puts "quit fiber finished!"
    end


    main_loop_run(loop)
    puts "main loop call returned!"
    puts "Why I don't quit!! whyyyy!!??? 😭️"
  end
end

Running this with crystal spec fibers_spec.cr -Dpreview_mt works fine as expected, but if I remove the -Dpreview_mt it hangs after finish the test case:

$ crystal spec fibers_spec.cr 
waiting for channel msg from main loop thread
main loop run on thread main!
calling 'g_main_loop_quit' from thread quit!
quit fiber finished!
main loop thread finished
Got the msg!
main loop call returned!
Why I don't quit!! whyyyy!!??? 😭️
.

What am I doing wrong?

Last night I ran into a very similar “why can’t this quit?!” issue and found it was because I had Channels with no size. Changing their size to 10 (an arbitrary value) fixed everything. I had forgotten that Channels are not like general purpose message passing queues, and if the Fiber that would normally drain the Channel was already dead, I couldn’t send a new message and things would just hang.

The weird part is that if I remove the Spec stuff and compile and run it as a normal app it works, but the code running as a test case hangs.

Don’t mix Threads you have created yourself with anything that involves the event loop. Ie both channels and yielding/sleeping will not behave the way you want as there is no run loop and the newly created thread is simply not integrating with the event loop. As soon as anything goes to sleep it won’t wake up again. If scheduled manually any fibers from those threads would either not come back at all or come back on one of the regular threads. What can be done in manually created threads is very, very limitited and they should probably be considered an internal interface. There has also been recent work in making them more private.

There is at least one library that provide threads that have run loops and integrate with the event loops, but that comes from monkey patching all over a lot of private interfaces and it can break (and has!) whenever a new release have been done.

In the code snippet I create the thread myself because with spawn, even with -Dpreview_mt isn’t guaranteed that the block will not share the thread with other fibers, in this case that I do a call to a blocking C function any other fiber running in this same thread will never run until the C function returns and I call Fiber.yield. So I believe that this is a use case where using Thread.new (that was soft removed from stdlib) is not only valid but the only way to make this work.

1 Like

I’m not 100% sure, but this feels like the same reason that will/crystal-pg was ported to use Crystal for the wire protocol instead of continuing to use libpq under the hood. A C library that blocks the thread blocks the entire thread, including the Crystal fiber scheduler, until its blocking condition is met.

Even in single-thread?

When I was doing GitHub - bcardiff/crystal-fswatch fswatch library also needs a dedicated thread. For using this in crystal single-thread mode something that won’t work is using channels to communicate between threads. I think this is something that could be affecting your code.

The runtime in single-thread mode does not allow channels to be used in custom threads. (Unless I missed some updates :see_no_evil:)

To workaround that, in the library I created something called ThreadPortal. It’s a plain wrapper on channel when MT, but a IO based sync for single thread.

Exactly, this is why I create the thread using Thread.new, to be sure the Crystal::Scheduler doesn’t play any role with this thread and let it be happy and isolated.

As I said, it works fine with/without -Dpreview_mt, but when I use the spec library it hangs, and hangs after all my code finish as show in the debug messages. I need to dig into spec code to find why…

Yes, because even on single thread Thread.new creates a new real thread no matter what.

If I comment the lines 1, 26, 27, 45 and 46 and run the code with/without -Dpreview_mt it works. So the issue seems inside the Spec code.

$ crystal run spec/fibers_spec.cr 
waiting for channel msg from main loop thread
main loop run on thread main!
calling 'g_main_loop_quit' from thread quit!
quit fiber finished!
main loop thread finished
Got the msg!
main loop call returned!
Why I don't quit!! whyyyy!!??? 😭️
hugo ~/src/gi-crystal   fibers  2.7.6p219 14:57:24
$ crystal run -Dpreview_mt spec/fibers_spec.cr 
waiting main loop to start...
waiting main loop to start...
waiting main loop to start...
waiting main loop to start...
waiting main loop to start...
waiting main loop to start...
waiting main loop to start...
waiting for channel msg from main loop thread
main loop run on thread main!

calling 'g_main_loop_quit' from thread quit!
quit fiber finished!
main loop thread finished
Got the msg!
main loop call returned!
Why I don't quit!! whyyyy!!??? 😭️
hugo ~/src/gi-crystal   fibers  2.7.6p219 14:57:33

I revisited this issue today because I need to do some network operations in a GTK application of mine without freeze the API, so I think I found a solution for GLib main loop integration with Crystal main loop without going through the deeps of Crystal::EventLoop libevent implementation.

@[Link("glib-2.0", pkg_config: "glib-2.0")]
lib LibGLib
  fun g_main_context_default : Void*
  fun g_main_context_iteration(ctx : Void*, may_block : Int32) : Int32
  fun g_timeout_add_seconds(interval : UInt32, func : Void* -> Int32, data : Void*) : UInt32
end

def glib_counter(_data)
  puts "glib counting"
  1
end

def crystal_counter
  loop do
    sleep(1)
    puts "crystal counting"
  end
end

# Create a GTK main loop context.
ctx = LibGLib.g_main_context_default

# Start GLib counter, triggered by GLib main loop events.
LibGLib.g_timeout_add_seconds(1, ->glib_counter(Void*), Pointer(Void).null)
# Start a Crystal counter, triggered by Crystal main loop events.
spawn crystal_counter

# Wait 10 seconds then quit.
spawn { sleep(10); exit }

Thread.new do
  loop do
    # Let the glib main loop run, blocking the thread if there's no event yet.
    LibGLib.g_main_context_iteration(ctx, 1)
  end
end

Channel(Nil).new.receive

This POC works for this plain GLib code, next step is to change this to work with g_application_run

it should print

crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting
crystal counting
glib counting

Nice! It works because threads happen to lazily start their local scheduler + dedicated event loop when you trigger a fiber reschedule or enqueue, which any IO will trigger.

You must enable -Dpreview_mtso the stdlib becomes thread safe (Schedulers, IO, Channel, …). You may disable the multiple worker threads with CRYSTAL_WORKERS=1 if you only need one thread for Crystal + another for GLib / Gtk. I’m doing just that in a tiny Gtk3 app (very small lived, yet capable to run HTTP requests without blocking the UI).

That works but… without any guarantees (it’s hacky, at best). This is something we aim to fix with RFC 0002: MT Execution Contexts by ysbaddaden · Pull Request #2 · crystal-lang/rfcs · GitHub for example.

1 Like