There’s a very rough test here.
It’s show the performance has a big impact when run CPU bound application with mt support enabled if small batches(equals to core number) with a large batch_size is used.
EDIT:
Following is a example:
Assume we have 10000 parallel tasks, run it in a 8 core 16 thread laptop, both solution CRYSTAL_WORKERS was set to 16, and we will run 16 fibers parallel.
solution one: we create 16 buffered channel, each channel size is 625.
solution two: we create 100 buffered channel, each channel size 100.
At least from my test, i saw the performance decrease significant for solution one, but not for solution two.
Perhaps I used it incorrectly?