Loop two String.each_char blocks at the same time

Hello, everyone! I want to loop two String.each_char blocks at the same time, like this:

"abc".each_char, "ABC".each_char do |char1, char2|
  puts "chr1=#{char1} chr2=#{char2}"
end

I expect the above code give output as :

chr1="a" chr2="A"
chr1="b" chr2="B"
chr1="c" chr2="C"

but the above code file when crystal build (Carcin).

I found zip, that works!

a.each_char.zip(b.each_char) do | char1, char2|
  puts "chr1=#{char1} chr2=#{char2}"
end
1 Like

Also, in case you don’t know it, there’s a useful p! macro. Try this:

a.each_char.zip(b.each_char) do |char1, char2|
  p! char1, char2
end
2 Likes

Given the strings have to be the same size, you could also just keep it simple and do like:

a.size.times do |idx|
  puts "chr1=#{a[idx]} chr2=#{b[idx]}"
end

Which prob would be pretty efficient.

2 Likes

If the string is not ASCII, that would actually be quite inefficient :sweat_smile:

2 Likes

I test the two ways, size.times cost less time~

a = "ATCG"*10000000
b = a
 
3.times do |e|
 t1=Time.utc
 a.each_char.zip(b.each_char) do |char1, char2|
   x=char1
   x=char2
 end
 t2=Time.utc
 puts t2-t1
end
 
puts 
3.times do |e|
 t1=Time.utc
 a.size.times do |idx|
    x=a[idx]
    x=b[idx]
 end
 t2=Time.utc
 puts t2-t1
end

after $ crystal build --release outputs as below:

00:00:00.220194917
00:00:00.219663276
00:00:00.221396420

00:00:00.183654377
00:00:00.183720934
00:00:00.183697484

Thanks! Got the p!

It all depends on whether the strings are ASCII or not.

Here’s a benchmark:

require "benchmark"

ascii_string = "ATCG"*1000
unicode_string = "コードギアス"*1000

Benchmark.ips do |x|
  x.report("each_char, ascii") do
    ascii_string.each_char.zip(ascii_string.each_char) do |char1, char2|
    end
  end

  x.report("index, ascii") do
    ascii_string.size.times do |idx|
      ascii_string[idx]
      ascii_string[idx]
    end
  end

  x.report("each_char, unicode") do
    unicode_string.each_char.zip(unicode_string.each_char) do |char1, char2|
    end
  end

  x.report("index, unicode") do
    unicode_string.size.times do |idx|
      unicode_string[idx]
      unicode_string[idx]
    end
  end
end

Output:

  each_char, ascii  51.86k ( 19.28µs) (± 3.19%)  160B/op     1.12× slower
      index, ascii  57.85k ( 17.29µs) (± 4.85%)  0.0B/op          fastest
each_char, unicode  15.80k ( 63.29µs) (± 4.45%)  160B/op     3.66× slower
    index, unicode   6.76  (147.93ms) (± 4.08%)  0.0B/op  8557.72× slower

Indexing a unicode string involves traversing the string from beginning until we find that index. There’s no faster way to do that. And if you index it first at 0, then at 1, then at 2, etc., it will end up being O(n^2) where n is the size of the string.

Also using each_char in the ASCII case is just 1.12x times slower, so it might be worth always using that, just to be safe.

6 Likes

Unless some smarts were added to unicode strings to cache pointers into the string and index from the closest cached value. Or maybe simply cache the last used index to cater for the common case of string traversal.