@jzakiya I just tried your version on my 10 years old laptop with 4c/8t.
It’s performing quite well, but requires lots of memory: with a batch size of 1_000_000 it takes ~126MB of RES memory and ~9GB of VIRT memory which isn’t really allocated, but still counted somehow, because with a lower batch size, the GC panics with out of memory errors (OOM). It finishes in 1m07s.
I tweaked my version above to allocate a full buffer for all batches in the channel (so fibers will never block). It only needs 3MB of RES memory and 0.7GB VIRT memory. It finishes in 1m06s.
I reduced the batch size to 1000, it then needed 18MB of RES memory (larger channel buffer), VIRT memory didn’t change, and it finished in… 1m05s. One second less
take aways:
- starting more CPU bound fibers than available hardware threads only hogs more memory: it won’t be faster (maybe slower since starting lots of fibers takes time);
- channels are fast, but you must pay attention to the buffer size: too small and the threads will have to wait () but make it too high and you will queue too many jobs than can be actually be processed (here for a benchmark, we don’t care).
tips:
- Crystal 1.13 introduced
WaitGroup
to replaceChannel(Nil)
. - You don’t need to collect each result, you only need 1 every 1_000_000 (but still have to save 'em all).