File reads, reversing a slice

Playing with file reads / writes, so this loop will read from an input file and write to an output file:

loop do
	buf = Slice(UInt8).new(4096)
	n = infile.read(buf)
	break if n == 0
	outfile.write(buf[0...n])
end

All is fine, the output file hashes correctly.

If I place a reverse! in there:

loop do
	buf = Slice(UInt8).new(4096)
	n = infile.read(buf)
	break if n == 0
	buf.reverse!
	outfile.write(buf[0...n])
end

.. and run it twice, first on an input file, and then on the file that the first run produced, I would have thought that I would be back to my starting point, but the final result (after running it twice) does not hash correctly, and the contents are jibberish.

If I then manually count the input file size and set the read_length accordingly:

loop do
	read_length = do_length(fsize)
	buf = Slice(UInt8).new(read_length)
	n = infile.read(buf)
	break if n == 0
	buf.reverse!
	outfile.write(buf[0...n])
	fsize -= read_length
end

where do_length is:

if infile_size < 4096 # infile_size counts down in the above loop
	read_length = infile_size
else
	read_length = 4096
end

Now it will give me my expected result, i.e., if I run it twice, once on an input file and that on that output, I’m back to the same hash as the original input, which is interesting, as only the last slice read would not be 4096, furthermore, read_length must be an exponential growth of 8, i.e., 1024, 2048, 4096, as a random multiple of 8 will give trash results. As an example, 3072 does not work.

This snippet doesn’t actually do anything useful, just an interesting observation from a hobbyist, and I’m sure there’s an explanation.

In the second snippet, buf’s size is always 4096, so if n < 4096 then the first element becomes the element at index 4095 after reverse!, not index n - 1. I believe all you need to do is:

loop do
	buf = Slice(UInt8).new(4096)
	n = infile.read(buf)
	break if n == 0
	outfile.write(buf[0...n].reverse!)
end

That would produce less garbage and should work most of the time. But it still would not guarantee to be revertible. If any read other than the last one happens to be smaller than n, the chunk-based reversion would be messed up.

The chunk size for reversing must be fixed.

Thank you for the insights, although I have to say that I’m still confused as to why the initial buffer size setting must be an exponential growth of (I realize now) the number 1, i.e., 1, 2, 4, 8 .. given that the last slice read can be any arbitrary number depending on the file size.

Here’s the whole segment that I played with today while experimenting, you can plug in other numbers to see what I mean:

def do_length(infile_size)
    if infile_size < 4096
        read_length = infile_size
    else
        read_length = 4096
    end
    return read_length
end

def main(file_name)
    fsize = File.size(file_name)
    outfile = File.open(file_name + “.out”, “wb”)
    infile = File.open(file_name, “rb”)
    loop do
        read_length = do_length(fsize)
        buf = Slice(UInt8).new(read_length)
        n = infile.read(buf)
        break if n == 0
        buf.reverse!
        outfile.write(buf[0...n])
        fsize -= read_length
    end
    infile.close
    outfile.close
end

main(ARGV[0])

File#read does not guarantee to fill buf. So buf.reverse! is wrong unless n == buf.size (see File reads, reversing a slice - #2 by HertzDevil).

I’m not entirely sure why your code works when the size of buf is a power of 2. It probably shouldn’t (or at least it cannot be expected).
Inspecting the internals could shed some light on that.

But it’s probably more important to fix the code so it actually works as expected.
You need to make sure to fill the buffer to the full chunk size before reversing it.

This could look like this:
Allocating buf outside the loop is more efficient. You can reuse it in each iteration instead of allocating a new one.

def reverse(input, output, buffersize = 4096)
  buf = Bytes.new(buffersize)
  loop do
    bytes_read = read_greedy(input, buf)
    output.write(buf[0, bytes_read].reverse!)
    break if bytes_read < buf.bytesize        
  end
end

def read_greedy(io, slice)
  count = slice.size
  while slice.size > 0
    read_bytes = io.read slice
    break if read_bytes.zero?
    slice += read_bytes
  end
  count - slice.size
end

read_gready curtesey of Add methods for filling a buffer from an IO greedily/lazily · Issue #14605 · crystal-lang/crystal · GitHub