Loop two String.each_char blocks at the same time

orangeSi · February 28, 2022, 9:59am

Hello, everyone! I want to loop two String.each_char blocks at the same time, like this:

"abc".each_char, "ABC".each_char do |char1, char2|
  puts "chr1=#{char1} chr2=#{char2}"
end

I expect the above code give output as :

chr1="a" chr2="A"
chr1="b" chr2="B"
chr1="c" chr2="C"

but the above code file when crystal build (Carcin).

orangeSi · February 28, 2022, 10:23am

I found zip, that works!

a.each_char.zip(b.each_char) do | char1, char2|
  puts "chr1=#{char1} chr2=#{char2}"
end

asterite · February 28, 2022, 12:03pm

Also, in case you don’t know it, there’s a useful p! macro. Try this:

a.each_char.zip(b.each_char) do |char1, char2|
  p! char1, char2
end

Blacksmoke16 · February 28, 2022, 2:29pm

Given the strings have to be the same size, you could also just keep it simple and do like:

a.size.times do |idx|
  puts "chr1=#{a[idx]} chr2=#{b[idx]}"
end

Which prob would be pretty efficient.

asterite · February 28, 2022, 2:55pm

If the string is not ASCII, that would actually be quite inefficient

orangeSi · March 1, 2022, 1:34am

I test the two ways, size.times cost less time~

a = "ATCG"*10000000
b = a
 
3.times do |e|
 t1=Time.utc
 a.each_char.zip(b.each_char) do |char1, char2|
   x=char1
   x=char2
 end
 t2=Time.utc
 puts t2-t1
end
 
puts 
3.times do |e|
 t1=Time.utc
 a.size.times do |idx|
    x=a[idx]
    x=b[idx]
 end
 t2=Time.utc
 puts t2-t1
end

after $ crystal build --release outputs as below:

00:00:00.220194917
00:00:00.219663276
00:00:00.221396420

00:00:00.183654377
00:00:00.183720934
00:00:00.183697484

orangeSi · March 1, 2022, 1:42am

Thanks! Got the p!

asterite · March 1, 2022, 12:14pm

It all depends on whether the strings are ASCII or not.

Here’s a benchmark:

require "benchmark"

ascii_string = "ATCG"*1000
unicode_string = "コードギアス"*1000

Benchmark.ips do |x|
  x.report("each_char, ascii") do
    ascii_string.each_char.zip(ascii_string.each_char) do |char1, char2|
    end
  end

  x.report("index, ascii") do
    ascii_string.size.times do |idx|
      ascii_string[idx]
      ascii_string[idx]
    end
  end

  x.report("each_char, unicode") do
    unicode_string.each_char.zip(unicode_string.each_char) do |char1, char2|
    end
  end

  x.report("index, unicode") do
    unicode_string.size.times do |idx|
      unicode_string[idx]
      unicode_string[idx]
    end
  end
end

Output:

  each_char, ascii  51.86k ( 19.28µs) (± 3.19%)  160B/op     1.12× slower
      index, ascii  57.85k ( 17.29µs) (± 4.85%)  0.0B/op          fastest
each_char, unicode  15.80k ( 63.29µs) (± 4.45%)  160B/op     3.66× slower
    index, unicode   6.76  (147.93ms) (± 4.08%)  0.0B/op  8557.72× slower

Indexing a unicode string involves traversing the string from beginning until we find that index. There’s no faster way to do that. And if you index it first at 0, then at 1, then at 2, etc., it will end up being O(n^2) where n is the size of the string.

Also using each_char in the ASCII case is just 1.12x times slower, so it might be worth always using that, just to be safe.

mselig · March 3, 2022, 2:25am

Unless some smarts were added to unicode strings to cache pointers into the string and index from the closest cached value. Or maybe simply cache the last used index to cater for the common case of string traversal.

Topic		Replies	Views
How to Iterate Over String Taking Each Nth Char? Help & Support	5	394	March 16, 2021
A blog article on performant vs idiomatic code (using Crystal examples) News blog	15	720	February 11, 2024
Translation of Ruby Code Help & Support	3	461	October 31, 2021
String#ascii_only? Crystal Contrib	17	448	November 23, 2024
Do ascii/binary strings exist? Help & Support	25	512	March 21, 2022

Loop two String.each_char blocks at the same time

Related topics