How to create fast custom iterators?

Does anyone have a guide for how to create a custom iterator?

For example, I want to be able to return an something Iterable from a function that is trying to search over something. I feel like I’m missing a Crystal idiom for how to string iterators together.

Benchmarking something like this, were I create a struct that includes Iterator, vs something that just returns an array, the array return is much faster.

# This is contrived! I know there are faster ways to sum this. 
class DataHolder
  getter data
  def initialize(@data : Array(Int32))
  end
  
  def find(query : Int32)
    # I want to return an iterator of values that matched in @data
    IterFind.new(self, query, 0)
  end
end

struct IterFind
  include Iterator(Int32)
  def initialize(@inner : DataHolder, @query : Int32, @offset : Int32)
  end
  def next
    while @offset < @inner.data.size
      data = @inner.data[@offset]
      @offset += 1
      if data == @query
        return data
      end
    end
    stop
  end
end

data_holder = DataHolder.new([2, 3, 4, 2, 2, 4, 5, 2])
p data_holder.find(2).sum

The real life code I’m trying to optimize is: https://github.com/sstadick/lapper.cr/blob/00e11d2717e4a24e8cf1c5ec80afdcc8a2105c29/src/lapper.cr#L173

So, the standard library provides a fair amount of iterators which you can combine. Filtering, mapping, and many other things are already covered.

It seems you want to only select some elements that match a given value in an array, then sum them?

array = [2, 3, 4, 2, 2, 4, 5, 2]
p array.
  each. # This creates an Iterator for the Array
  select { |x| x == 2 }. # This filters elements equal to 2
  sum # This sums them

So you probably don’t need to write an iterator.

That said, can you show your benchmark?

Also, what exactly are you trying to optimize? It’s not clear from what you said.

Here’s a benchmark of different ways to sum arrays where a value equals a given target:

require "benchmark"

array = [2, 3, 4, 2, 2, 4, 5, 2]

Benchmark.ips do |x|
  x.report("sum via select") do
    # A bit slow: an intermediate array is created
    array.select { |x| x == 2 }.sum
  end
  x.report("sum with block") do
    # Fastest: directly sum values in the array, no memory allocated
    array.sum { |x| x == 2 ? x : 0 }
  end
  x.report("sum via iterator") do
    # Faster than "sum via select" because of an iterator, but
    # an iterator allocates memory too, and there's more indirection
    array.each.select { |x| x == 2 }.sum
  end
end

Results:

  sum via select  13.14M ( 76.13ns) (± 3.88%)  80.0B/op  11.69× slower
  sum with block 153.56M (  6.51ns) (± 7.20%)   0.0B/op        fastest
sum via iterator  29.91M ( 33.43ns) (± 2.64%)  32.0B/op   5.13× slower

Also a note: I see in your readme you use pointerof. Oh no! That’s so unsafe! Please try to avoid using that. At least that’s the recommendation around here. If you use it carefully it’s fine, but seeing it in a readme might be an immediate “no no” for many.

Thanks for looking at this @HankAnderson! I tried reducing my problem to something more sharable to avoid getting caught up in the optimizing filter/sum. In the process I think I’ve answered my own question. Returning a struct implementing Iterator is not actually that slow. It’s slower than yield, but that should be expected since yield has less moving parts. So I’ll just overload and have both!

require "benchmark"

struct Interval
  getter start, stop

  def initialize(@start : Int32, @stop : Int32)
  end
end

r = Random.new(seed: 12345)

intervals = [] of Interval
10_000.times do
  start = r.rand(100_000)
  stop = start + r.rand(10_000)
  intervals << Interval.new(start, stop)
end
intervals.sort_by! { |iv| {iv.start, iv.stop} }

def find_naive(intervals, start, stop)
  offset = 0
  result = [] of Interval
  while offset < intervals.size
    iv = intervals[offset]
    offset += 1
    if iv.start < stop && iv.stop > start
      result << iv
    elsif iv.start >= stop
      break
    end
  end
  result
end

def find_yield(intervals, start, stop)
  offset = 0
  while offset < intervals.size
    iv = intervals[offset]
    offset += 1
    if iv.start < stop && iv.stop > start
      yield iv
    elsif iv.start >= stop
      break
    end
  end
end

struct IterFind
  include Iterator(Interval)

  def initialize(@intervals : Array(Interval), @off : Int32, @start : Int32, @stop : Int32)
  end

  def next
    while @off < @intervals.size
      iv = @intervals[@off]
      @off += 1
      if iv.start < @stop && iv.stop > @start
        return iv
      elsif iv.start >= @stop
        break
      end
    end
    stop
  end
end

def find_iterator(intervals, start, stop)
  IterFind.new(intervals, 0, start, stop)
end

Benchmark.ips do |x|
  x.report("Naive") {
    total = 0
    intervals.each do |q|
      total += find_naive(intervals, q.start, q.stop).size
    end
  }
  x.report("yield") {
    total = 0
    intervals.each do |q|
      find_yield(intervals, q.start, q.stop) { |iv| total += 1 }
    end
  }
  x.report("iter") {
    total = 0
    intervals.each do |q|
      total += find_iterator(intervals, q.start, q.stop).size
    end
  }
end

Results:

Naive   4.57  (218.79ms) (± 1.69%)  427MB/op   2.37× slower
yield  10.82  ( 92.39ms) (± 4.56%)   0.0B/op        fastest
 iter   5.85  (170.86ms) (± 2.30%)   0.0B/op   1.85× slower

Regarding the use of pointerof, good point. I’ll make some adjustments!

Note that if you have a mutable Iterator, then a struct might be a bit unintuitive depending on how you use it.

Going back to your first example:

data_holder = DataHolder.new([2, 3, 4, 2, 2, 4, 5, 2])

iter1 = data_holder.find(2)
iter1.next
iter1.next
p iter1.to_a # => [2, 2]

iter = data_holder.find(2)
advance_two(iter) # Advanced a copy, because it's a struct
p iter.to_a # => [2, 2, 2, 2]

I’m not sure how well a mutable struct Iterator plays regarding other methods like select, map, etc. But if it works on your particular case then I guess it’s fine. But in the standard library all mutable iterators are classes, never structs.

Also, in your particular example changing the struct iterator to a class gives better performance, even if it allocates more memory. The struct is probably holding a lot of data and it’s expensive to pass it around all the time by copying it. Passing a reference (pointer) is faster.

1 Like

That’s a great point. I’ll try it out with a class and see how that works instead. I think you are right that it was the large struct sizes on my real life benchmark that was making it slow.