a = "ATCG"*10000000
b = a
3.times do |e|
t1=Time.utc
a.each_char.zip(b.each_char) do |char1, char2|
x=char1
x=char2
end
t2=Time.utc
puts t2-t1
end
puts
3.times do |e|
t1=Time.utc
a.size.times do |idx|
x=a[idx]
x=b[idx]
end
t2=Time.utc
puts t2-t1
end
It all depends on whether the strings are ASCII or not.
Here’s a benchmark:
require "benchmark"
ascii_string = "ATCG"*1000
unicode_string = "コードギアス"*1000
Benchmark.ips do |x|
x.report("each_char, ascii") do
ascii_string.each_char.zip(ascii_string.each_char) do |char1, char2|
end
end
x.report("index, ascii") do
ascii_string.size.times do |idx|
ascii_string[idx]
ascii_string[idx]
end
end
x.report("each_char, unicode") do
unicode_string.each_char.zip(unicode_string.each_char) do |char1, char2|
end
end
x.report("index, unicode") do
unicode_string.size.times do |idx|
unicode_string[idx]
unicode_string[idx]
end
end
end
Indexing a unicode string involves traversing the string from beginning until we find that index. There’s no faster way to do that. And if you index it first at 0, then at 1, then at 2, etc., it will end up being O(n^2) where n is the size of the string.
Also using each_char in the ASCII case is just 1.12x times slower, so it might be worth always using that, just to be safe.
Unless some smarts were added to unicode strings to cache pointers into the string and index from the closest cached value. Or maybe simply cache the last used index to cater for the common case of string traversal.