FWIW that’s what the 5th column is showing. E.g. that shr and unsafe_as allocate 0 bytes of memory per operation, while io allocates 96 bytes. This is probably why the io one is the slowest.
Any time you see a Benchmark.ips entry taking ~1ns, you’ve hit the floor for how low you can measure. This usually means one or both of these things:
The operation is faster than 1ns
LLVM is optimizing out the block entirely
Running your code on my machine indicated that both of these things were happening, so we need to measure multiple iterations within the report block to get an accurate benchmark as well as invoke a side effect to keep LLVM from optimizing out the report block entirely.
Here, we run the methods 1000x per measurement and mutate an element of an array allocated at the outermost scope to store the result of the method as our side effect. This fixes both issues above. With that in place, these are the results on my machine:
All orders of magnitude here are 1000x higher due to iterations = 1_000, so the entries that are measured in nanoseconds are actually measured in picoseconds per iteration, and the one measured in microseconds is actually measured in nanoseconds.