Why so big differences?

Could you tell us what kind of performance differences you see?

I can run the code myself and see what comes from that. But your results might be different and we wouldn’t be talking about the same data.

This is the output on my machine (with --release):

664579
00:00:02.656616357
664579
00:00:01.523873195
664579
00:00:00.043423086

So solutions 1 and 2 are in the same ball park, but solution 3 is orders of magnitude faster.

I suspect that the nested loop in solution 3 might be easier for the compiler to optimize. It’s essentially just a StepIterator. The other examples have more complex chained iterators that are harder to optimize.

You can even gain a bit more performance by dropping this iterator entirely: Instead of step(*args).each { block } you can write step(*args) { block }. This iterates the block directly without allocating an iterator instance. That allows even more code optimizations (and it’s already much faster without any optimizations) and is an instant performance boost.

There are a couple more unnecessary .each in your code. You can drop them all. In most cases, it doesn’t make much difference when you already have an iterator (because then it’s a no-op). Somtimes like mentioned in the previous paragraph, it has a signfificant effect.
Similarly, .each.count in the measure blocks should also be just .count. This ends up using BitArray#count which is more optimized than Iterator#count.

2 Likes