This is great!
Do you think you could send that, or something similar, as a PR to the main crystal repository? Ideally there shouldn’t be a need to have multiple UUID implementations if we can have a good one in the standard library.
Actually, nevermind, I can try to optimize it and see if I can reach the same level of performance as yours. Then we wouldn’t need to have multiple copies of UUID.
I modified your code a little and here are my results:
bash-3.2$ ./benchmark
Crystal 0.33.0 (2020-02-14)
LLVM: 9.0.1
Default target: x86_64-apple-macosx
UUIX: 2.86M (349.14ns) (± 2.04%) 180B/op fastest
Crystal UUID: 798.78k ( 1.25µs) (± 1.18%) 0.0B/op 3.59× slower
Here is the code:
module UUIX
VERSION = "0.1.0"
SLIDE = 3
@@SIZE : Int32 = 1024
@@IDX : Int32 = 0
@@BUFFER : Bytes = Bytes.empty
@@HEX : Array(String) = Array(String).new(256) do |i|
(i + 256).to_s(16)[1..]
end
def self.random
if @@BUFFER.size == 0 || ((@@IDX + 16 + SLIDE) > @@SIZE)
@@BUFFER = Random.new.random_bytes(@@SIZE)
@@IDX = 0
end
arr = @@BUFFER[@@IDX..(@@IDX + 16)]
@@IDX += SLIDE
String.build do |str|
(0..3).each do |idx|
str << @@HEX[arr[idx]]
end
str << '-'
str << @@HEX[arr[4]]
str << @@HEX[arr[5]]
str << '-'
str << @@HEX[arr[6] & 15 | 64]
str << @@HEX[arr[7]]
str << '-'
str << @@HEX[arr[8] & 63 | 128]
str << @@HEX[arr[9]]
str << '-'
arr[10..15].each do |val|
str << @@HEX[val]
end
end
end
end
Size of the buffer does not affect speed much, I tried different sizes and on 64KB it has a little bit of boost but much below 0.1% so I stick with 1024 bytes.
Also in your code it is using Random and not Random::Secure so it is not much cryptographically secure. You have to modify your code a little by adding ::Secure to the Random and remove new.
So it will look like this: @@BUFFER = Random::Secure.random_bytes(@@SIZE)
With Random::Secure my code is little bit slower but still nice performance boost:
bash-3.2$ ./benchmark
Crystal 0.33.0 (2020-02-14)
LLVM: 9.0.1
Default target: x86_64-apple-macosx
UUIX: 2.12M (471.14ns) (± 3.60%) 180B/op fastest
Crystal UUID: 782.34k ( 1.28µs) (± 2.41%) 0.0B/op 2.71× slower
Look at my modification in this thread. It has quite a good performance. SLIDE allows to define a window to shift. It still makes GUID unique as it is quite low probability that shifted GIUDS will be identical but it decreases amount of generation of random data.
It can be modified to run even faster by allocating fixed Bytes slice and set it directly there and then at the end to make string out of Bytes.
How impressive! I think I’ll add these changes to the library. Give me your GitHub user to add you to the README
Just a note: the optimization might be good but it’s not thread safe. If two fibers want to produce a random value, and it’s running in multithreading, then a data race might happen. You could introduce a mutex but then the performance will be degraded.
For now I will only modify the final part of the algorithm: the generation of the string.
I made it even faster:
module UUIX
VERSION = "0.1.0"
SLIDE = 1
@@SIZE : Int32 = 4096
@@IDX : Int32 = 0
@@BUFFER : Bytes = Bytes.empty
@@HEX_VALUES : StaticArray(UInt8, 16) = StaticArray[0x30_u8, 0x31_u8, 0x32_u8, 0x33_u8, 0x34_u8, 0x35_u8,
0x36_u8, 0x37_u8, 0x38_u8, 0x39_u8, 0x61_u8, 0x62_u8,
0x63_u8, 0x64_u8, 0x65_u8, 0x66_u8]
macro binToHex (result, pos, hexValue)
{{result}}[{{pos}}] = @@HEX_VALUES[{{hexValue}} >> 4 & 0xF]
{{result}}[{{pos+1}}] = @@HEX_VALUES[{{hexValue}} & 0xF]
end
def self.random
result = Bytes.new(37)
if @@BUFFER.size == 0 || ((@@IDX + 16 + SLIDE) > @@SIZE)
@@BUFFER = Random::Secure.random_bytes(@@SIZE)
@@IDX = 0
end
arr = @@BUFFER[@@IDX..(@@IDX + 16)]
@@IDX += SLIDE
binToHex(result, 0, arr[0])
binToHex(result, 2, arr[1])
binToHex(result, 4, arr[2])
binToHex(result, 6, arr[3])
result[8] = 0x2d_u8
binToHex(result, 9, arr[4])
binToHex(result, 11, arr[5])
result[13] = 0x2d_u8
binToHex(result, 14, (arr[6] | 64))
binToHex(result, 16, arr[7])
result[18] = 0x2d_u8
binToHex(result, 19, (arr[8] | 128))
binToHex(result, 21, arr[9])
result[23] = 0x2d_u8
binToHex(result, 24, arr[10])
binToHex(result, 26, arr[11])
binToHex(result, 28, arr[12])
binToHex(result, 30, arr[13])
binToHex(result, 32, arr[14])
binToHex(result, 34, arr[15])
String.new(result)
end
end
It gives 6x speed improvement for Random::Secure:
bash-3.2$ ./benchmark
Crystal 0.33.0 (2020-02-14)
LLVM: 9.0.1
Default target: x86_64-apple-macosx
UUIX: 4.67M (213.98ns) (± 3.00%) 113B/op fastest
Crystal UUID: 783.57k ( 1.28µs) (± 2.15%) 0.0B/op 5.96× slower
and it is faster for Random.
My GitHub userid is skuznetsov
It is possible to put it behind the mutex.
In this case we have to create a bigger random buffer so it will decrease the probability of locking.
My last code is very small and fast so it will make locking probability even less.
This is repost of my response to https://github.com/lukeed/uuid/issues/1
I was able to achieve even more impressive speed on my MacBook:
Crystal 0.33.0 (2020-02-14)
LLVM: 9.0.1
Default target: x86_64-apple-macosx
UUIX: 5.92M (169.04ns) (±10.99%) 113B/op fastest
Crystal UUID: 770.02k ( 1.30µs) (± 3.55%) 0.0B/op 7.68× slower
This is with secure random buffer equal to 16MB and 1 byte shift (instead of original 16 as it is still unique and probability of generation of the same sequence is even less possible than with shift to 16) so it is quite good results with just 169 nanoseconds per one hash (including random data generation).
Also by some reason 16 MB buffer is fastest. sizes above and below are giving lower results.
Hi all! I’m author of the JavaScript module – thank you @krthr for porting this over :)
In JS-land, pre-allocating a 4096-length Buffer (versus, say, 1024) had a measurable effect in performance. This is because (again, in JS land), it’s quite expensive to instantiate and fill the ArrayBuffer
with random values. We have access to a crypto
module that is a CSPRNG.
I’m not sure of the mechanics or internals of Crystal (one day I’d like to!) but I’m sure this is why the port includes the magic 4096 number.
This also leads me to ask: What kind of randomizer is Random.new.random_bytes
? Is it cryptographically safe, or is it the equivalent of JS’ Math.random()
?
If it’s effectively Math.random
then I have another solution that yielded 50% improvement on my side. Maybe it’ll have a similar effect in Crystal?
Edit: Should have read the thread title
Thanks~!
Instead of doing Bytes.new
, then String.new
with that, you can just do String.new(size) do |buffer|
and get a pointer to the string’s buffer. That way you avoid allocating twice.
Try this (or your newer code, but with this modification). It should be a bit faster:
module UUIX
VERSION = "0.1.0"
SLIDE = 1
@@SIZE : Int32 = 4096
@@IDX : Int32 = 0
@@BUFFER : Bytes = Bytes.empty
@@HEX_VALUES : StaticArray(UInt8, 16) = StaticArray[0x30_u8, 0x31_u8, 0x32_u8, 0x33_u8, 0x34_u8, 0x35_u8,
0x36_u8, 0x37_u8, 0x38_u8, 0x39_u8, 0x61_u8, 0x62_u8,
0x63_u8, 0x64_u8, 0x65_u8, 0x66_u8]
macro binToHex(result, pos, hexValue)
{{result}}[{{pos}}] = @@HEX_VALUES[{{hexValue}} >> 4 & 0xF]
{{result}}[{{pos + 1}}] = @@HEX_VALUES[{{hexValue}} & 0xF]
end
def self.random
if @@BUFFER.size == 0 || ((@@IDX + 16 + SLIDE) > @@SIZE)
@@BUFFER = Random::Secure.random_bytes(@@SIZE)
@@IDX = 0
end
arr = @@BUFFER[@@IDX..(@@IDX + 16)]
@@IDX += SLIDE
String.new(36) do |result|
binToHex(result, 0, arr[0])
binToHex(result, 2, arr[1])
binToHex(result, 4, arr[2])
binToHex(result, 6, arr[3])
result[8] = 0x2d_u8
binToHex(result, 9, arr[4])
binToHex(result, 11, arr[5])
result[13] = 0x2d_u8
binToHex(result, 14, (arr[6] | 64))
binToHex(result, 16, arr[7])
result[18] = 0x2d_u8
binToHex(result, 19, (arr[8] | 128))
binToHex(result, 21, arr[9])
result[23] = 0x2d_u8
binToHex(result, 24, arr[10])
binToHex(result, 26, arr[11])
binToHex(result, 28, arr[12])
binToHex(result, 30, arr[13])
binToHex(result, 32, arr[14])
binToHex(result, 34, arr[15])
{36, 36}
end
end
end
require "benchmark"
require "uuid"
Benchmark.ips do |x|
x.report("UUIX:") do
UUIX.random
end
x.report("Crystal UUID:") do
UUID.random
end
end
It’s equivalent to Math.random()
. But UUID in Crystal is cryptographically safe, it uses Random::Secure
by default which is not equivalent to Math.random()
.
Got it, thanks! So then the Crystal fork would/should be modified for parity to begin with – that is, if it can’t be merged into core.
Here are new JS numbers running w/ 4096 and a slide=1:
@lukeed/next x 6,447,983 ops/sec ±0.31% (93 runs sampled)
@lukeed/uuid x 2,544,311 ops/sec ±0.30% (94 runs sampled)
Thanks for the feedback!
Is it okay that asking uuids reutilizes many bytes from previous uuids? It sounds a bit… not random?
It is still unique though.
I don’t know. Maybe if I know the GUID of an object I can try guessing other GUIDs by popping the first letter and trying 16 different combinations. Doesn’t sound very secure to me…