Hi, I have been scratching my head when it comes with this piece of code:
require "http"
server = HTTP::Server.new do |context|
name = nil
file = nil
HTTP::FormData.parse(context.request) do |part|
io = IO::Memory.new
if body = part.body
IO.copy(body, io)
end
file = File.tempfile("upload") do |file|
IO.copy(io, file)
end
end
if file
context.response << file.path
end
end
server.bind_tcp 8085
server.listen
The problem is that when the body from the part is copied into the IO::Memory, the memory usage increases by the double the size of the file being uploaded on the form, so if I upload a 25MB file using curl -F "file=@/home/user/somefile.mp4" http://localhost:8085/, the memory usage increases up to ~50MB (double the file size).
The exact same thing happens when I do:
HTTP::FormData.parse(context.request) do |part|
part.body.getb_to_end
file = File.tempfile("upload") do |file|
IO.copy(part.body, file)
end
end
However, when I use this instead:
HTTP::FormData.parse(context.request) do |part|
file = File.tempfile("upload") do |file|
IO.copy(part.body, file)
end
end
It uses 0 additional memory.
I would like to understand why Crystal uses double the memory when using:
io = IO::Memory.new
if body = part.body
IO.copy(body, io)
end
I think the gist of it is when you do part.body.getb_to_end or copy it into an IO::Memory, you’re loading the content of the file into memory, so of course more memory needs to be allocated to account for it. If you copy the data directly from the request body (via part.body) to a file, then it can do it in a streaming fashion, hence no extra memory is really needed.
As to why it uses double the memory of the file, it’s likely because of how memory is allocated in powers of two, iirc to reduce the total amount of allocations as it’s a non-trivial amount of work. IO::Memory.new constructor accepts the initial capacity, so you could likely pass a better guess than the default 64 to possibly reduce the total memory usage. I.e. if your file is 25MB, try setting the initial capacity to 26MB and see how the memory usage is after that.
Since part.body is copied into io, it seems somewhat obvious that memory usage would double.
So, we need to think carefully about the true intention behind the question.
I think the real question might be:
“Why doesn’t memory usage double when writing directly to a file?”
I asked deepwiki about this and got the following answer:
IO.copy uses a 32KB memory buffer.
It repeatedly performs read → write loops using that buffer.
Every 32KB written triggers one LibC.write system call.
Thanks to this, even very large files can be written using only 32KB of memory.
If LibC.write fails, libevent waits until writing becomes possible.
(This is one of Crystal’s popular features compared to Rust)
Yep, indeed, that’s it. When using IO::Memory.new(size) instead of letting it grow depending of the data that is being inserted, it uses exactly the memory of the size put on the IO::Memory. Thanks!
If anyone is wondering why I asked this, it was because I wanted read just the first bytes of an uploaded file to read it’s magic bytes and detect it’s extension and I was able to do it this way: upload.cr#L21-L23. Is not directly related to this issue, but I had this question when trying to solve that problem.