Ask some question for Fiber scheduler traceing log

zw963 · October 25, 2024, 6:18pm

Question 1: Why when a fiber is exiting, and after it quited, fiber scheduler will sched.reschedule this quited fiber immediately?

it will going to quit, right? reschedule is meaningless? it don’t know itself will exit? right?

following is some example log:

sched.resume 11824863278048 thread=0x7e079774b740:? fiber=0x7e0797742f00:main fiber=0x7e0797742d20:fiber-each1

print: reach spawn end for fiber-each1

sched.reschedule 11824863287426 thread=0x7e079774b740:? fiber=0x7e0797742d20:fiber-each1

sched.resume 11824863291184 thread=0x7e079774b740:? fiber=0x7e0797742d20:fiber-each1 fiber=0x7e0797742c80:fiber-each2

print: reach spawn end for fiber-each2

sched.reschedule 11824863297325 thread=0x7e079774b740:? fiber=0x7e0797742c80:fiber-each2

Question 2: Following info is print but without any context knowledge about , i don’t really understand what its does, is there any doc explain on this?

I even don’t know where is the fiber=0x7e0797742e60:Stack pool collector spawn from, because no trace log output for where it spawn from.

but, anyway, i consider it doesn’t affect my understanding of the fibers flow, so print those log info is necessary for debug scheduler?

sched.resume 11824735945636 thread=0x7e079774b740:? fiber=0x7e0797742f00:main fiber=0x7e0797742e60:Stack pool collector

sched.sleep 11824735950125 thread=0x7e079774b740:? fiber=0x7e0797742e60:Stack pool collector for=5000000000

sched.resume 11824735978759 thread=0x7e079774b740:? fiber=0x7e0797742e60:Stack pool collector fiber=0x7e0797742dc0:Signal Loop
sched.reschedule 11824735998236 thread=0x7e079774b740:? fiber=0x7e0797742dc0:Signal Loop

If need source code for reproduce, i will added it.

Thanks

kojix2 · October 25, 2024, 11:58pm

Here’s a simplified and readable English translation of your text:

I wanted to understand how rescheduling works in Crystal, so here’s my investigation.

First, I started with fiber.cr.

From that, I found that Crystal::Scheduler is responsible for managing Fiber execution. Next, I asked ChatGPT to look at scheduler.cr and explain it.

It turns out that Fiber.swapcontext handles the switch from the current Fiber to another by changing the execution context. But where is this swapcontext implemented? It’s here.

fiber
├── context
│   ├── aarch64.cr
│   ├── arm.cr
│   ├── i386.cr
│   ├── interpreted.cr
│   ├── wasm32.cr
│   ├── x86_64-microsoft.cr
│   └── x86_64-sysv.cr
├── context.cr
└── stack_pool.cr

These files are specific to different system architectures. For Linux, the relevant file is x86_64-sysv.cr, which follows the System V ABI. I asked ChatGPT for further clarification and learned that assembly code is used here to directly manipulate registers and the stack.

Two functions are important: makecontext and swapcontext.

makecontext sets fiber_main as the entry point for the Fiber’s first execution.
swapcontext switches execution to a new Fiber by updating the stack pointer (though, to be honest, the finer details are still a bit over my head).

Next, I asked ChatGPT about stack_pool.cr.

This revealed that each Fiber is assigned 8 MiB of stack memory, with logic to handle memory allocation and deallocation efficiently.

With these surface-level insights, I circled back to my initial question: How does the scheduler assign the next task to a Fiber?

Here’s the relevant reschedule method:

protected def reschedule : Nil
  loop do
    if runnable = @lock.sync { @runnables.shift? }
      resume(runnable) unless runnable == @thread.current_fiber
      break
    else
      Crystal.trace :sched, "event_loop" do
        @event_loop.run(blocking: true)
      end
    end
  end
end

This method loops through available Fibers. If none are available, it calls @event_loop.run(blocking: true) to wait for the next event.
@event_loop refers to EventLoop, which is implemented here.

On Linux, Crystal uses Crystal::LibEvent::EventLoop.new, which is a binding for libevent.

It seems that LibEvent2.event_base_loop waits for the right moment to resume processing.

Most likely, libevent2 uses Linux system calls such as epoll or select to efficiently monitor events.

This translation aims to keep the friendly and curious tone of your original text while making it concise and easy to follow.

zw963 · October 26, 2024, 10:07am

Oops, thanks a lot, I guess i misunderstood the meaning of rescheduler when i ask this question.

The incorrect understanding before

sched.resume fiber-each1
print: reach spawn end for fiber-each1
sched.reschedule ??? fiber=0x7d6718574b40:fiber-each1

I was wrong to think about above code will add back the will quiting fiber-each1 into the queue.

the correct understanding now

reschedule just try to find out the next need resume fiber and then left pop it for resume, AFAIK, reschedule only happen if necessary, e.g. for my code example, it only happen on following cases:

the first time running reschedule on main fiber, was used to find the Signal loop fiber.
the second time running reschedule on Signal loop fiber, was used for reorder how to resume fibers. after run it, fibers will resume will a fixed ordered.
a fiber is quitting, so, the scheduler need reordered.
Channel#send operation
sleep

Probably other cases, will update if i know it.

Topic		Replies	Views
Fiber.yield switching issue Help & Support	6	109	October 21, 2024
Fiber#join Crystal Contrib	18	1279	December 3, 2019
Why fiber scheduler work different when switch fibers use Channel#send OR Fiber.yield? Help & Support	7	125	October 26, 2024
Finding where a fiber is halt Learning Resources	6	841	January 15, 2024
Question when spawning a fiber Help & Support	5	506	November 29, 2019

Ask some question for Fiber scheduler traceing log

The incorrect understanding before

the correct understanding now

Related topics