Is this a good way to generate a random string?

def random_string(lenght) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  res = IO::Memory.new
  lenght.times do
    "".rjust(res, 1, chars[Random.rand(62)])
  end
  res.to_s
end

I don’t really need anything better than that. I just wonder if there anything better that would outperform this function to squeeze a little bit of performance.

1 Like

The overall mechanism seems good, but there are a couple details that could be improved for more idiomatic code:

  • String.build instead of IO::Memory
  • append to the io (<<) instead of rjust
  • chars.to_slice.sample instead of random index access
  • spelling: length
def random_string(length) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.build(length) do |io|
    length.times do
      io << chars.to_slice.sample
    end
  end
end
2 Likes

If the goal is to just get a string, you could do like Random::DEFAULT.hex length // 2. Otherwise if it has to be alphanumeric, then you have to do something yourself.

3 Likes

simpler version:

def random_string(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(Bytes.new(chars.to_slice.sample(len).to_unsafe, len))

end

update:

As @straight-shoota mentioned, sample(len) will not occur duplicates, so it does not meet the requirements, please ignore this answer

This is not equivalent to the above, though. Enumerable#sample(Int) works without replacement, thus characters won’t occur more than once.

1 Like

Maybe right

def random_string(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(len) do |bytes|
    bytes.to_slice(len).fill { chars.to_slice.sample }
    {len, len}
  end
end

def random_string2(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(Bytes.new(len) { chars.to_slice.sample })
end

def random_string3(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new (1..len).map { chars.to_slice.sample }.to_unsafe, len
end

puts random_string 20
puts random_string2 20
puts random_string3 20

require "benchmark"
Benchmark.ips(warmup: 4, calculation: 10) do |x|
  x.report "1" do
    random_string 20
  end

  x.report "2" do
    random_string2 20
  end

  x.report "3" do
    random_string3 20
  end
end
❯ crystal run --release case2.cr
T3F8uyLDm2ybcj8mzd6R
mDfrqybWRQx52Qf5aSwe
2APnY3wnyNSCGWZANT8o
1  15.41M ( 64.90ns) (± 2.16%)  48.0B/op        fastest
2  13.90M ( 71.95ns) (± 2.22%)  80.0B/op   1.11× slower
3  11.73M ( 85.23ns) (± 3.47%)   112B/op   1.31× slower

I tried this and the first function is the fastest one of every other function.

I benchmarked mine (4) and it was way more slower.

QS5f8mnDi5f7kLuwZsOS
UMHAEkDf4AkOKlC3hNgK
na2UJcgRoVEJkDqI9Rpo
MfEMqfo62eMlq1tqpKUz
1   6.37M (157.00ns) (± 0.80%)  48.0B/op        fastest
2   5.82M (171.74ns) (± 3.07%)  80.0B/op   1.09× slower
3   5.43M (184.15ns) (± 0.95%)   112B/op   1.17× slower
4   2.11M (474.29ns) (± 1.74%)   225B/op   3.02× slower

Thanks everyone.

The performance difference between the first and second is not very significant, fluctuating on my device. I added 4 and 5 additional tests to @aiac 's test script. The final result showed slightly better performance in the fifth. This greatly benefits from the use of to_unsafe , after patching Test 1, it should be able to achieve optimal performance.

def random_string(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(length) do |bytes|
    bytes.to_slice(length).fill { chars.to_slice.sample }
    {length, length}
  end
end

def random_string2(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(Bytes.new(length) { chars.to_slice.sample })
end

def random_string3(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new (1..length).map { chars.to_slice.sample }.to_unsafe, length
end

def random_string4(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new Pointer.malloc(3*length) { chars.to_unsafe[rand(62)] }, length
end

def random_string5(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new Bytes.new(length) { chars.to_unsafe[rand(62)] }
end

puts random_string 20
puts random_string2 20
puts random_string3 20
puts random_string4 20
puts random_string5 20

require "benchmark"
Benchmark.ips(warmup: 4, calculation: 10) do |x|
  x.report "1" do
    random_string 20
  end

  x.report "2" do
    random_string2 20
  end

  x.report "3" do
    random_string3 20
  end

  x.report "4" do
    random_string4 20
  end

  x.report "5" do
    random_string5 20
  end
end

output:

1  14.56M ( 68.66ns) (± 1.92%)  49.0B/op   1.63× slower
2  14.85M ( 67.35ns) (± 6.49%)  80.0B/op   1.60× slower
3  13.93M ( 71.79ns) (± 2.55%)   112B/op   1.70× slower
4  10.95M ( 91.29ns) (± 3.38%)   112B/op   2.16× slower
5  23.71M ( 42.17ns) (± 5.87%)  80.0B/op        fastest
1 Like

Just a note: shorter is not the same as simpler. I find @straight-shoota’s approach with String.build and the nested blocks to be much easier to read.

Also, unrelated to simplicity, I get a little worried when I see to_unsafe (which is the point of that method). Sometimes it’s the best solution, but usually it isn’t.

1 Like

Yes, I should have said it shorter so you would feel better, but do you really care?

I didn’t say that to be a jerk about word choices, I said it because mistaking shortness for simplicity is a good way to cause problems for yourself later, and I want to help people write code that doesn’t cause them a headache later.


Anyway, after some testing, this is the fastest version of this method that I’ve found so far:

def random_string(length : Int32) : String
  # change chars length to 64 by adding dash and underscore
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"

  # then all valid indices are in [0,63], so just get a bunch of bytes
  # and divide until they're guaranteed to be small enough
  # (this seems to be about as fast as a right shift; the compiler probably optimizes it)
  indices = Random::DEFAULT.random_bytes(length).map! { |v| v // 4 }

  # and then use the buffer-based string constructor to set the characters
  String.new(capacity: length) do |buffer|
    indices.each_with_index do |chars_index, buffer_index|
      buffer[buffer_index] = chars.byte_at(chars_index)
    end

    # return size and bytesize (might differ if chars included non-ASCII)
    {length, length}
  end
end
1 Like
Unhandled exception: Index out of bounds (IndexError)
  from /usr/share/crystal/src/string.cr:1353:22 in 

Oh, oops. Sorry about that. Mixed up the bits of >> 2 and // 4. I’ll fix the code above.

Looks like you caught the bug while I was writing this. The latency of this implementation is about half of the others on my machine, which is incredible. Previously, the fastest implementations in this thread ran in 60-61ns per iteration on my machine, and yours took 30ns.

I was able to whittle an implementation down to just over 59ns on my machine by writing a purpose-built String.random constructor that bypasses a few instructions in the regular string constructor. I was shaving off at most 2ns, so I didn’t post it because it took a lot of effort to shave off so little time.

Yours is so much faster because random_bytes operates on a full machine word at a time rather than byte-by-byte (asterite has talked in the past about how memcpy does the same thing for this reason), but I also noticed it was doing an extra heap allocation. It was allocating 80B per iteration vs 48B for the previously fastest implementations.

I know from experience that, on my machine, an extra object allocation adds 10-15ns in average iteration time. So if it’s this much faster with an extra allocation, I’m sure we can improve it even more — at least for strings up to a given size. So I iterated on it and came up with this:

def random_string8(length : Int32, random = Random::DEFAULT) : String
  # change chars length to 64 by adding dash and underscore
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
  if length <= 1024
    buffer = uninitialized UInt8[1024]
    bytes = buffer.to_slice[0...length]
  else
    bytes = Bytes.new(length)
  end

  # then all valid indices are in [0,63], so just get a bunch of bytes
  # and divide until they're guaranteed to be small enough
  # (this seems to be about as fast as a right shift; the compiler probably optimizes it)
  random.random_bytes(bytes)
  bytes.map! { |v| v % chars.bytesize }

  # and then use the buffer-based string constructor to set the characters
  String.new(capacity: length) do |buffer|
    bytes.each_with_index do |chars_index, buffer_index|
      buffer[buffer_index] = chars.byte_at(chars_index)
    end

    # return size and bytesize (might differ if chars included non-ASCII)
    {length, length}
  end
end

If the string you want is less than or equal to 1KB, it will put the buffer on the stack, avoiding a heap allocation. This is how much faster it is than anything else in this thread on my machine:

1  16.63M ( 60.12ns) (± 1.08%)  48.0B/op   4.04× slower
2  15.78M ( 63.39ns) (± 0.91%)  80.0B/op   4.26× slower
3  14.30M ( 69.95ns) (± 0.90%)   112B/op   4.70× slower
4   5.54M (180.64ns) (± 0.95%)   112B/op  12.14× slower
5  15.15M ( 66.00ns) (± 1.01%)  80.0B/op   4.44× slower
6  16.20M ( 61.72ns) (± 1.07%)  48.0B/op   4.15× slower
7  49.60M ( 20.16ns) (± 0.91%)  48.0B/op   1.35× slower
8  67.21M ( 14.88ns) (± 1.19%)  48.0B/op        fastest

7 is my implementation mentioned above, which uses aspects of your implementation and still falls behind.

Another thing that helps your implementation is that you use a 64-byte alphabet. The latency increases by about 50% (~22ns) when I change it to a 62-byte alphabet.

Full benchmark
def random_string1(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(len) do |bytes|
    bytes.to_slice(len).fill { chars.to_slice.sample }
    {len, len}
  end
end

def random_string2(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new(Bytes.new(len) { chars.to_slice.sample })
end

def random_string3(len) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new (1..len).map { chars.to_slice.sample }.to_unsafe, len
end

def random_string4(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new Pointer.malloc(3*length) { chars.to_unsafe[rand(62)] }, length
end

def random_string5(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
  String.new Bytes.new(length) { chars.to_unsafe[rand(62)] }
end

def random_string6(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".to_unsafe

  String.new length do |buffer|
    length.times { |index| buffer[index] = chars[rand(62)] }
    {length, length}
  end
end

def random_string7(length : Int32) : String
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789".to_slice

  String.random length, chars
end

def random_string8(length : Int32, random = Random::DEFAULT) : String
  # change chars length to 64 by adding dash and underscore
  chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
  if length <= 1024
    buffer = uninitialized UInt8[1024]
    bytes = buffer.to_slice[0...length]
  else
    bytes = Bytes.new(length)
  end

  # then all valid indices are in [0,63], so just get a bunch of bytes
  # and divide until they're guaranteed to be small enough
  # (this seems to be about as fast as a right shift; the compiler probably optimizes it)
  random.random_bytes(bytes)
  bytes.map! { |v| v % chars.bytesize }

  # and then use the buffer-based string constructor to set the characters
  String.new(capacity: length) do |buffer|
    bytes.each_with_index do |chars_index, buffer_index|
      buffer[buffer_index] = chars.byte_at(chars_index)
    end

    # return size and bytesize (might differ if chars included non-ASCII)
    {length, length}
  end
end

puts random_string1 20
puts random_string2 20
puts random_string3 20
puts random_string4 20
puts random_string5 20
puts random_string6 20
puts random_string7 20
puts random_string8 20

require "benchmark"
Benchmark.ips do |x|
  x.report "1" { random_string1 20 }
  x.report "2" { random_string2 20 }
  x.report "3" { random_string3 20 }
  x.report "4" { random_string4 20 }
  x.report "5" { random_string5 20 }
  x.report "6" { random_string6 20 }
  x.report "7" { random_string7 20 }
  x.report "8" { random_string8 20 }
end

def String.random(bytesize : Int32, alphabet : Bytes, random = Random::DEFAULT)
  str = GC.malloc_atomic(bytesize.to_u32 + HEADER_SIZE + 1).as(UInt8*)
  buffer = str.as(String).to_unsafe
  bytes = Bytes.new(buffer, bytesize)
  random.random_bytes bytes
  bytes.map! { |byte| byte % alphabet.bytesize }
  alphabet = alphabet.to_unsafe

  bytesize.times do |index|
    buffer[index] = alphabet[buffer[index]]
  end

  buffer[bytesize] = 0_u8

  set_crystal_type_id(str)
  str = str.as(String)
  str.initialize_header(bytesize.to_i, bytesize.to_i)
  str
end

module Benchmark
  def self.batch_ips(*args, **kwargs)
    BatchIPS.new(*args, **kwargs)
  end

  class BatchIPS
    def report(msg = "")
    end
  end
end
1 Like
1   5.91M (169.13ns) (± 1.11%)  48.0B/op   3.18× slower
2   5.69M (175.70ns) (± 1.03%)  80.0B/op   3.30× slower
3   5.11M (195.74ns) (± 0.93%)   112B/op   3.67× slower
4   4.62M (216.39ns) (± 2.49%)   112B/op   4.06× slower
5  10.28M ( 97.23ns) (± 6.78%)  80.0B/op   1.83× slower
6  10.90M ( 91.71ns) (± 1.45%)  49.0B/op   1.72× slower
7  11.85M ( 84.40ns) (± 0.92%)  49.0B/op   1.58× slower
8  18.77M ( 53.27ns) (± 1.40%)  48.0B/op        fastest

This is how random_string8 looks on my i5-7400

Have you tried replacing the % chars.bytesize with >> 2, // 4, or & 63u8? In my super basic testing yesterday, they all seemed about equivalent in terms of performance, but they all seemed faster than modulo, and I think you lose the benefit of a length-64 character set if you call bytesize anyway.

I think & 63u8 should be equivalent to % chars.bytesize due to the character set size, but since you’re randomizing I don’t think it matters much.

Using division or bit shifts only works for the 64B alphabets. I used % because I was playing around with 62- and 64-byte alphabets because I noticed a difference between them, so the flexibility ended up being more important.

Modulo can still be fast, though. On x86 and x86-64, at least, division and modulo operations are both performed in the same CPU instruction and there isn’t a way to get one without the other. I’m running an ARM CPU, which I’m less familiar with in general, but I assume the situation is similar.

At the very least, it didn’t change the performance in a notable way between value % chars.bytesize and value // 4 when the alphabet was 64 bytes. I haven’t looked at what LLVM IR is emitted with optimizations enabled but, since chars is a string literal, chars.bytesize is known at compile time so LLVM can likely hardcode it as a numeric literal during codegen, and dividing by a power of 2 is possibly optimized into a cheaper instruction like a bitshift because the performance is so much different.

I just ran a quick benchmark and discovered 2 things, one of which is surprising and the the other validates a hypothesis I’ve had for a while:

                   division 795.05k (  1.26µs) (± 0.80%)  0.0B/op   2.05× slower
                     modulo   1.06M (944.98ns) (± 0.88%)  0.0B/op   1.54× slower
division by numeric literal   1.63M (613.21ns) (± 0.75%)  0.0B/op        fastest
  modulo by numeric literal   1.48M (676.91ns) (± 0.83%)  0.0B/op   1.10× slower
  • % is actually faster than integer division on my CPU (this was the surprising thing)
  • whether you use a numeric literal has an even larger impact (this confirms my hypothesis)
Benchmark
require "benchmark"

values = [0] of Int64 # circumvent benchmark-specific optimizations
Benchmark.ips do |x|
  iterations = 1_000
  # Avoid LLVM optimizations for literals
  numerator = rand(Int64).to_s.to_i64
  denominator = "10".to_i64

  x.report "division" { iterations.times { values[0] = numerator // denominator } }
  x.report "modulo" { iterations.times { values[0] = numerator % denominator } }
  x.report "division by numeric literal" { iterations.times { values[0] = numerator // 10 } }
  x.report "modulo by numeric literal" { iterations.times { values[0] = numerator % 10 } }
end

Integer division by a constant is often replaced with wrapping multiplication + a bunch of shifts even for divisors that aren’t powers of 2 by LLVM, you might want to look at the assembly output on Compiler Explorer. For example here is division and modulo by 62: Compiler Explorer

3 Likes

I create a new base58.cr shard for reuse the awesome code in this discussion, thanks all.

3 Likes

Related: Suggestion: Add Random#alphanumeric · Issue #8993 · crystal-lang/crystal · GitHub