Following the changes from RFC 7 (see RFC 0007: Event loop refactor), I started experimenting with epoll and kqueue, which led me to open multiple pull requests that quickly got superseded over the last 2 months.
The long development has come to an end: the final pull request is ready!
If there are any brave souls, you can try to review it (please) but be aware that the changes are significant Hopefully there should be enough comments and documentation and the order of commits can help to grasp some of the introduced concepts.
It most likely wonāt make it into 1.14 since weāre so close to the release. Even behind a preview flag, it might not make much sense to push something that might have issues in a release. Letās polish it over the development of 1.15 instead.
Alternatively, thereās nothing preventing you from testing the branch: checkout the branch and recompile your application, and hopefully ā¦ otherwise please report regressions or issues. In the very best scenario (e.g. purely IO bound with long lived connections) you might experience up to +20% improvement , at worst it shouldnāt be slower than libevent.
The changes are significant yet there are many things I kept for future improvements, like a skiplist for timers, integrating signalfd / EVFILT_SIGNAL as well as pidfd / EVFILT_PROC, and more things. This is only the ācoreā of more improvements to the event loop
I documented myself on io_uring and have some preliminary design ideas about how to integrate it in Crystal, but that will have to wait. Iām working on RFC 2 now, and trying to take advantage of that new evloop
Thanks a lot @ysbaddaden Just compiled a 7K LOC Kemal application with this branch. Run some benchmarks with wrk and I didnāt see any errors, the performance is on par with Crystal 1.13.1 maybe %1 - %2 faster which is within error margin.
āMost application shouldnāt notice any impact because of this design choice, since a fd is usually not shared across fibers (concurrency issues), except maybe a server socket with multiple accepting fibers? In that case youāll need to make sure the fibers are on the same thread (preview_mt) or same context (RFC 2).ā
Does this mean that under certain conditions -Dpreview_mt is or becomes obsolete?
Yes. A new multithreading environment is coming up, as discussed in RFC 2. The event loop refactor is a prerequisite for that. Itāll replace the old -Dpreview_mt.
I tested this with the current Benben trunk code, about 71K LOC total, and things run/compile just fine. Performance seems to take a decent hit, though Iām also comparing Crystal 1.12.2 to this PR. I compiled Crystal myself both times.
Rendering 79 VGM files to WAV, four in parallel on four cores:
Crystal v1.12.2: rendering took 03:28.087905842 with an avg samples/sec of 906,261
This PR: rendering took 03:49.727951144 with an avg samples/sec of 821,117
@MistressRemilia Damn. I assume Benben is working with disk files, right? Meaning that it should never block (disk files are always ready) and thus never enter epoll. Maybe this is an impact of the EventLoop#read and #write methods that are slightly different?
Would you have some reproducible steps? I donāt know where to find VGM files for instance.
@MistressRemilia I checked out your code (nice, I learned a bit of Fossil), and you only use disk files, no sockets, and I canāt seem to pinpoint anything fundamentally different in both evloops to explain why it would be slower to read, write or close from files (itās always blocking).
Then the next explanation would be the timers: sleeps (obviously), fiber yield (they create a 0 second timer). You donāt use IO timeouts or select with timeouts, but you do use sleep and Fiber.yield.
Theyāre the weak point of this evloop refactor: the Crystal::Evented::Timers is just a placeholder for a better data structure; itās thread unsafe and is protected by a spin lock (not even a mutex) which creates a contention point for threads, etc.
Is this to be understood as in āitās known, and will changeā or as in āitās a limitation of the epoll eventloop refactor and we canāt do much about itā?
Yeah, itās only writing stuff to disk, no sockets or anything. Specifically, using --render will cause it to write rendered audio data to WAV (or Au) files. If you want some VGM files to test, check VGMRips (maybe the X68000 version of Castlevania since I use it as part of my benchmarking). Or just test it with FLAC/MP3/Ogg/WavPack/etc.
FWIW, I also checked it against 1.13.3 just now and the regression doesnāt appear in that version. So I think you may be on to something wit the Timers.