Running a profiler on the original app I wrote for this thread (the one that yields the fiber and then serializes JSON) shows the event loop needs 12% of the CPU time available to that thread (1.55 seconds out of 12.82):
Most of that is libevent (1.46 / 1.55s, or 94.2% of CPU time), so we can’t optimize that any further within Crystal:
Scheduling is ~64ms out of that 1.55s (4%) so any optimization there will yield negligible results.
In a larger app, the event loop is significantly less of a concern. Here is the profile of a Reddit-like app I wrote a while back to test out the neo4j shard:
This hits a Neo4j database and outputs HTML, so it’s a reasonably realistic workload. The event loop used 399ms out of 5.98 seconds (6%). Fiber scheduling was 28ms — 0.5% of that fiber’s 6 CPU seconds. The rest of the event loop was all libevent.
Note: all of these traces contain 6 threads, but I only showed the heaviest one because they’re basically all the same


