Looks like you caught the bug while I was writing this. The latency of this implementation is about half of the others on my machine, which is incredible. Previously, the fastest implementations in this thread ran in 60-61ns per iteration on my machine, and yours took 30ns.
I was able to whittle an implementation down to just over 59ns on my machine by writing a purpose-built String.random
constructor that bypasses a few instructions in the regular string constructor. I was shaving off at most 2ns, so I didn’t post it because it took a lot of effort to shave off so little time.
Yours is so much faster because random_bytes
operates on a full machine word at a time rather than byte-by-byte (asterite has talked in the past about how memcpy
does the same thing for this reason), but I also noticed it was doing an extra heap allocation. It was allocating 80B per iteration vs 48B for the previously fastest implementations.
I know from experience that, on my machine, an extra object allocation adds 10-15ns in average iteration time. So if it’s this much faster with an extra allocation, I’m sure we can improve it even more — at least for strings up to a given size. So I iterated on it and came up with this:
def random_string8(length : Int32, random = Random::DEFAULT) : String
# change chars length to 64 by adding dash and underscore
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
if length <= 1024
buffer = uninitialized UInt8[1024]
bytes = buffer.to_slice[0...length]
else
bytes = Bytes.new(length)
end
# then all valid indices are in [0,63], so just get a bunch of bytes
# and divide until they're guaranteed to be small enough
# (this seems to be about as fast as a right shift; the compiler probably optimizes it)
random.random_bytes(bytes)
bytes.map! { |v| v % chars.bytesize }
# and then use the buffer-based string constructor to set the characters
String.new(capacity: length) do |buffer|
bytes.each_with_index do |chars_index, buffer_index|
buffer[buffer_index] = chars.byte_at(chars_index)
end
# return size and bytesize (might differ if chars included non-ASCII)
{length, length}
end
end
If the string you want is less than or equal to 1KB, it will put the buffer on the stack, avoiding a heap allocation. This is how much faster it is than anything else in this thread on my machine:
1 16.63M ( 60.12ns) (± 1.08%) 48.0B/op 4.04× slower
2 15.78M ( 63.39ns) (± 0.91%) 80.0B/op 4.26× slower
3 14.30M ( 69.95ns) (± 0.90%) 112B/op 4.70× slower
4 5.54M (180.64ns) (± 0.95%) 112B/op 12.14× slower
5 15.15M ( 66.00ns) (± 1.01%) 80.0B/op 4.44× slower
6 16.20M ( 61.72ns) (± 1.07%) 48.0B/op 4.15× slower
7 49.60M ( 20.16ns) (± 0.91%) 48.0B/op 1.35× slower
8 67.21M ( 14.88ns) (± 1.19%) 48.0B/op fastest
7
is my implementation mentioned above, which uses aspects of your implementation and still falls behind.
Another thing that helps your implementation is that you use a 64-byte alphabet. The latency increases by about 50% (~22ns) when I change it to a 62-byte alphabet.
Full benchmark
def random_string1(len) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
String.new(len) do |bytes|
bytes.to_slice(len).fill { chars.to_slice.sample }
{len, len}
end
end
def random_string2(len) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
String.new(Bytes.new(len) { chars.to_slice.sample })
end
def random_string3(len) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
String.new (1..len).map { chars.to_slice.sample }.to_unsafe, len
end
def random_string4(length : Int32) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
String.new Pointer.malloc(3*length) { chars.to_unsafe[rand(62)] }, length
end
def random_string5(length : Int32) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
String.new Bytes.new(length) { chars.to_unsafe[rand(62)] }
end
def random_string6(length : Int32) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".to_unsafe
String.new length do |buffer|
length.times { |index| buffer[index] = chars[rand(62)] }
{length, length}
end
end
def random_string7(length : Int32) : String
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".to_slice
String.random length, chars
end
def random_string8(length : Int32, random = Random::DEFAULT) : String
# change chars length to 64 by adding dash and underscore
chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
if length <= 1024
buffer = uninitialized UInt8[1024]
bytes = buffer.to_slice[0...length]
else
bytes = Bytes.new(length)
end
# then all valid indices are in [0,63], so just get a bunch of bytes
# and divide until they're guaranteed to be small enough
# (this seems to be about as fast as a right shift; the compiler probably optimizes it)
random.random_bytes(bytes)
bytes.map! { |v| v % chars.bytesize }
# and then use the buffer-based string constructor to set the characters
String.new(capacity: length) do |buffer|
bytes.each_with_index do |chars_index, buffer_index|
buffer[buffer_index] = chars.byte_at(chars_index)
end
# return size and bytesize (might differ if chars included non-ASCII)
{length, length}
end
end
puts random_string1 20
puts random_string2 20
puts random_string3 20
puts random_string4 20
puts random_string5 20
puts random_string6 20
puts random_string7 20
puts random_string8 20
require "benchmark"
Benchmark.ips do |x|
x.report "1" { random_string1 20 }
x.report "2" { random_string2 20 }
x.report "3" { random_string3 20 }
x.report "4" { random_string4 20 }
x.report "5" { random_string5 20 }
x.report "6" { random_string6 20 }
x.report "7" { random_string7 20 }
x.report "8" { random_string8 20 }
end
def String.random(bytesize : Int32, alphabet : Bytes, random = Random::DEFAULT)
str = GC.malloc_atomic(bytesize.to_u32 + HEADER_SIZE + 1).as(UInt8*)
buffer = str.as(String).to_unsafe
bytes = Bytes.new(buffer, bytesize)
random.random_bytes bytes
bytes.map! { |byte| byte % alphabet.bytesize }
alphabet = alphabet.to_unsafe
bytesize.times do |index|
buffer[index] = alphabet[buffer[index]]
end
buffer[bytesize] = 0_u8
set_crystal_type_id(str)
str = str.as(String)
str.initialize_header(bytesize.to_i, bytesize.to_i)
str
end
module Benchmark
def self.batch_ips(*args, **kwargs)
BatchIPS.new(*args, **kwargs)
end
class BatchIPS
def report(msg = "")
end
end
end