Welcome to the Crystal community, @orangeSi!
The issue seems to be that
IO::Memory stores its capacity as an
Int32, so once you’ve read 2GB into the buffer, its size overflows to become a negative number. I don’t know if this is the exact problem you’re experiencing, but it seems pretty likely.
With very few exceptions, I doubt anyone actually needs to hold that much data in a single buffer, and it looks like you can avoid holding the entire output in memory by using
IO.pipe instead of an
IO::Memory and passing a block to
reader, writer = IO.pipe
Process.run "cat xx.txt|cut -f 1-6|sort -k 1,1", shell: true, output: writer do |process|
line = reader.gets
Notice this code creates two
IO.pipe — we tell the child process to write to one end and our code reads from the other end.
IO::Memory isn’t a good fit for streaming both input and output at the same time, especially across processes since it only has a single position marker for both reading and writing — that is, if you call
io.puts "foo", calling
io.gets won’t read
"foo" since your position in the buffer is at the end of the buffer you just wrote. It also doesn’t release memory after it’s been read. That’s a feature (it’s what allows
io.rewind), it just doesn’t scale well for huge inputs. :-D