What’s the motivation behind the 4KB buffer in IO.copy? I guess it’s a trade-off between memory and number of syscalls/cpu usage, but might 4KB be a little too restrictive for modern systems? I don’t know if crystal is used much in embedded systems? A larger buffer (up to 128KB) performs up to 5 times better:
require "benchmark"
SIZES = { 4096, 8196, 16384, 131072, 262144 }
class IO
{% for n in SIZES %}
def self.copy_{{n}}(src, dst, limit : Int) : UInt64
raise ArgumentError.new("Negative limit") if limit < 0
limit = limit.to_u64
buffer = uninitialized UInt8[{{n}}]
remaining = limit
while (len = src.read(buffer.to_slice[0, Math.min(buffer.size, Math.max(remaining, 0))])) > 0
dst.write buffer.to_slice[0, len]
remaining -= len
end
limit - remaining
end
{% end %}
end
File.open("/dev/zero") do |zero|
File.open("/dev/null", "w") do |null|
Benchmark.ips do |x|
{% for n in SIZES %}
x.report("copy 1GB with {{n}}B buffer") do
IO.copy_{{n}}(zero, null, 1_048_576)
end
{% end %}
end
end
end
No motivation at all. 4KB is a typical number for a stack buffer. Feel free to open a PR in GitHub.
That said, the larger a stack buffer, the longer it takes LLVM to compile the program. However, I tried it with 128KB (I guess that’s 131072?) and it made no difference, so maybe that number isn’t that big for LLVM or they fixed something…
we can change the default to 8192 to align things.
I would like to have a way for people to tweak the default buffer size affecting IO.copy and IO::Buffered. Maybe a macro that could offer an api IO.set_default_buffer_size 262144
Since it is in the prelude and it needs to be a constant to use it as an argument for StaticArray it’s a bit trickier. But might be doable. As long as the API to change the default buffer size is nice, I’m satisfied.
Allow defining specific copy definitions for different buffer sizes
I was thinking of something like this:
module IO
macro def_copy(buffer_size)
def self.copy_{{buffer_size))(src, dst)
# ...
end
end
end
IO.def_copy 8192
IO.copy_8192(src, dst)
IO.set_default_buffer_size
The one Brian mentioned, which I think is nice. However, since this is a global thing maybe a shard wants to do it one way, maybe your code wants a different size, who wins? Load order, which is not intuitive.
Allow configuring the default IO.copy buffer size with environment var at compile time
That way there’s only one place you can tweak this setting. And you can configure it with an env var at compile time.
The cons for using an env var is that I don’t want to abuse of that feature solve this kind of thing.
The other day we’ve been having some ideas with waj about how we could inject configuration values like this and others. But is a story for the future.
If the cons of the load order in the macro is too much, I am happy to settle on the env var solution.
The load order problem is only a problem if the buffer size actually matters (in any way other than execution speed)
Even if some shard sets a certain butter size for any reason, it will still work if the user overrides it with a value earlier in the load order.
If that is the case, I would go with a sensible default, overridable in code with a macro as suggested above, but with a check that ensures the first try to set a new value wins.
If a shard sets a value, it gets used, unless the user sets his own value before requiring the shard.
everyone is happy?
Thanks @bcardiff for the link.
Yeha this is much needed. Right now with such a small buffer size standard HTTP operations when reading files and pages abused the CPU because of the constant loops around new buffer allocation and and again and again.
It would be wonderful to just have a way to set the wanted buffer, and yeha a default which is higher then 4096 makes a LOT of sense in today’s world
One way to tweak the IO buffer size could be to do this:
class IO
macro finished
{% unless @type.constant("BUFFER_SIZE") %}
BUFFER_SIZE = 8192
{% end %}
end
def self.copy(src, dst) : Int64
puts "copy with buffer size: #{BUFFER_SIZE}" # <= just to verify the value is correct
buffer = uninitialized UInt8[BUFFER_SIZE]
count = 0_i64
while (len = src.read(buffer.to_slice).to_i32) > 0
dst.write buffer.to_slice[0, len]
count &+= len
end
count
end
end
Then the user can simply do this:
class IO
BUFFER_SIZE = 131072
end
a = IO::Memory.new
b = IO::Memory.new
IO.copy(a, b)
There are several distinct angles here, roughly ordered by effort:
Customizing individual buffers
Adjusting the global configuration to a maybe more appropriate default than 4K
Customizing the global configuration
Numbers 2. and 3. need a global configuration (i.e. a constant) instead of magic numbers. This would be a great idea anyways.
And then there’s the question of whether a global configuration applies to all buffer applications, or if there should be separate configurations.
To answer that, we’ll need a list of all buffer applications in stdlib and think about whether it makes sense to treat them all the same or group them into different use cases.
The easiest implementation is certainly number 1., by adding a parameter for passing in an allocated buffer. This allows the user to choose the size and whether it’s heap or stack memory.
My opinion: please increase the default size to a number that other languages use. Go uses 32 * 1024 = 32768. C# uses 81920. Crystal uses 4096, which is too small these days.
I would increase it to 32768 permanently, without allowing configuration. Then I’d say if anyone still complains. If so, I’d introduce a configuration option. But configuring this and using static arrays is hard! For IO.copy maybe we can introduce an IO.copy(src, dst, buffer_size) macro, but for String#gets_to_end it’s hard because it’s an instance method.
Another opinion: in general when you design a product and there are multiple options, the thing you shouldn’t do is give the user options. Instead, you should try to choose the best option you can think of. Only when that’s not enough you should introduce options.
If we want such customizability, I’d just offer a buffer argument which the user can use with any kind of buffer. That removes the need for customizing the internal static array.
Golang provides that same option with io.CopyBuffer.
I agree with @asterite that the first action can be increasing to 32k most of the buffers (especially network based one).
On a latter date introducing some global or method bound option to customize it might be beneficial as well, especially if your application reads large files over the network or from disk, or if the application want to optimize CPU/Mem.
Right now I see critical CPU consumption issues when handling .gets_to_end because of the low buffer size, this is relevant to all HTTP operations and is felt strongly when using multiple Fibers because it will hang all operations while looping in mem allocation instead of actually allowing other Fibers to work while doing IO operations which is the whole point.
I would love to contribute a PR of mass changing all buffers to 32k if that’s needed