Read nrrd-file with attached header and gzipped data

Hi there,

I want to read a nrrd-file.
see here for the format: Fileformat

Reading the header works fine by reading a few single lines:

# open the NRRD-file Header with followed compressed data
head = File.read_lines("file.nrrd")

# read the header
head.each do |line|
    # empty line for the end of the header
    if line == ""
        break
    elsif line =~ /(type: )/ # searching for "type: "
        if il = (line =~ /(: )/) # position for "double"
            header = line[il+2..]
        end
    end
end

After the header there is a single blank line followed by the gzipped data.

How can I read this compressed data and decompress it to a array of float64 values?

Thanks for any help.

Bob

Found a solution that works.
Maybe someone else can use this stuff:

header = 0

# open the NRRD-file Header with followed compressed data
content = File.open(file)

# reading the header
content.each_line do |line|
    # empty line for the end of the header
    if line == ""
        break
    elsif line =~ /(type: )/ # searching for "type: "
        if il = (line =~ /(: )/) # position for "double"
            header = line[il+2..]
        end
    end
end

# read the rest of the file
data = content.gets_to_end

io_data = IO::Memory.new data

# decompress the data
reader =  Compress::Gzip::Reader.new(io_data)
string = String::Builder.build do |builder|
    IO.copy(reader, builder)
end

# string to float64
arr = [] of Float64
io = IO::Memory.new(string)
# we want to have Float (8 bytes)
values = (io.bytesize / 8).to_u32
values.times do |value|
    arr << io.read_bytes(Float64, IO::ByteFormat::LittleEndian)
end

This looks good.

But you don’t need to read the entire streams into memory in between. Instead, you can just use the previous streams directly. That’s much more efficient because it uses less memory.

Something like this:

 # read the rest of the file
-data = content.gets_to_end
-
-io_data = IO::Memory.new data
-
 # decompress the data
-reader =  Compress::Gzip::Reader.new(io_data)
+io =  Compress::Gzip::Reader.new(content)
-string = String::Builder.build do |builder|
-  IO.copy(reader, builder)
-end
-io = IO::Memory.new(string)

Note that with this approach you can’t know the bytesize in advance; instead you can keep reading until you hit EOS.

Oh, much more easier with only 1 line of code! Thanks a lot.

Is there a better way than running into the EOFError?

io = Compress::Gzip::Reader.new(content)

arr = [] of Float64
loop do
    begin
        arr << io.read_bytes(Float64, IO::ByteFormat::LittleEndian)
    rescue
        # EOFError
        break
    end
end

I think you could do something like:

arr = [] of Float64

while (bytes = io.peek) && !bytes.empty?
  arr << io.read_bytes(Float64, IO::ByteFormat::LittleEndian)
end

Tho it has the assumption that the IO only contains Float64 values, otherwise there may be some data left in the io that isn’t a valid float.

1 Like

Alternatively use read (just read) with a slice, then de serialize the floats from it. When read returns zero it means it’s the end of the stream. It might also be faster like that if you pass it a large buffer.

Yeah, we’re missing IO#read_bytes? for optional reading (tracked in Add IO#read_string? and #read_bytes? · Issue #6905 · crystal-lang/crystal · GitHub).

1 Like

Thanks a lot for the help!