What’s the motivation behind the 4KB buffer in IO.copy? I guess it’s a trade-off between memory and number of syscalls/cpu usage, but might 4KB be a little too restrictive for modern systems? I don’t know if crystal is used much in embedded systems? A larger buffer (up to 128KB) performs up to 5 times better:
require "benchmark"
SIZES = { 4096, 8196, 16384, 131072, 262144 }
class IO
{% for n in SIZES %}
def self.copy_{{n}}(src, dst, limit : Int) : UInt64
raise ArgumentError.new("Negative limit") if limit < 0
limit = limit.to_u64
buffer = uninitialized UInt8[{{n}}]
remaining = limit
while (len = src.read(buffer.to_slice[0, Math.min(buffer.size, Math.max(remaining, 0))])) > 0
dst.write buffer.to_slice[0, len]
remaining -= len
end
limit - remaining
end
{% end %}
end
File.open("/dev/zero") do |zero|
File.open("/dev/null", "w") do |null|
Benchmark.ips do |x|
{% for n in SIZES %}
x.report("copy 1GB with {{n}}B buffer") do
IO.copy_{{n}}(zero, null, 1_048_576)
end
{% end %}
end
end
end
No motivation at all. 4KB is a typical number for a stack buffer. Feel free to open a PR in GitHub.
That said, the larger a stack buffer, the longer it takes LLVM to compile the program. However, I tried it with 128KB (I guess that’s 131072?) and it made no difference, so maybe that number isn’t that big for LLVM or they fixed something…
we can change the default to 8192 to align things.
I would like to have a way for people to tweak the default buffer size affecting IO.copy and IO::Buffered. Maybe a macro that could offer an api IO.set_default_buffer_size 262144
Since it is in the prelude and it needs to be a constant to use it as an argument for StaticArray it’s a bit trickier. But might be doable. As long as the API to change the default buffer size is nice, I’m satisfied.
Allow defining specific copy definitions for different buffer sizes
I was thinking of something like this:
module IO
macro def_copy(buffer_size)
def self.copy_{{buffer_size))(src, dst)
# ...
end
end
end
IO.def_copy 8192
IO.copy_8192(src, dst)
IO.set_default_buffer_size
The one Brian mentioned, which I think is nice. However, since this is a global thing maybe a shard wants to do it one way, maybe your code wants a different size, who wins? Load order, which is not intuitive.
Allow configuring the default IO.copy buffer size with environment var at compile time
That way there’s only one place you can tweak this setting. And you can configure it with an env var at compile time.
The cons for using an env var is that I don’t want to abuse of that feature solve this kind of thing.
The other day we’ve been having some ideas with waj about how we could inject configuration values like this and others. But is a story for the future.
If the cons of the load order in the macro is too much, I am happy to settle on the env var solution.
The load order problem is only a problem if the buffer size actually matters (in any way other than execution speed)
Even if some shard sets a certain butter size for any reason, it will still work if the user overrides it with a value earlier in the load order.
If that is the case, I would go with a sensible default, overridable in code with a macro as suggested above, but with a check that ensures the first try to set a new value wins.
If a shard sets a value, it gets used, unless the user sets his own value before requiring the shard.
everyone is happy?