Zlib Compress Help Needed

I’ve been wrestling with compressing and decompressing strings using Zlib. From the Zlib spec, I’ve written the following methods and tests:

require “spec”
require “compress/zlib”

def compress(data : String) : String
io = IO::Memory.new
writer = Compress::Zlib::Writer.new(io)
writer.flush
writer.print data
writer.close
return io.to_s
end
def decompress(data : String) : String
io = IO::Memory.new
data.scan(/…/).each do |match|
io.write_byte match[0].to_u8(16)
end
io.rewind
reader = Compress::Zlib::Reader.new(io)
str = String::Builder.build do |builder|
IO.copy(reader, builder)
end
return str
end
it “compression” do
data = “this is a test string !!!\n”
encdata = compress(data)
encdata.should eq “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
end
it “decompress” do
data = “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
decdata = decompress(data)
decdata.should eq “this is a test string !!!\n”
end

The decompress passes the test but the compress fails. The test returns:
Expected: “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
got: “x\x9C\u0000\u0000\u0000\xFF\xFF+\xC9\xC8,V\u0000\xA2D\x85\x92\xD4\xE2\u0012\x85⒢̼t\u0005E \xE0\u0002\u0000\x85O\b{”

What do I need to do to this IO::Memory object to get the expected string?

For me, both of your specs were failing. However this seems to work:

require "spec"
require "compress/zlib"

def compress(data : String) : String
  io = IO::Memory.new
  writer = Compress::Zlib::Writer.new(io)
  writer.flush
  writer.print data
  writer.close
  return io.to_slice.hexstring
end

def decompress(data : String) : String
  io = IO::Memory.new data.hexbytes
  reader = Compress::Zlib::Reader.new(io)
  str = String::Builder.build do |builder|
    IO.copy(reader, builder)
  end
  return str
end

it "compression" do
  data = "this is a test string !!!!\n"
  encdata = compress(data)
  encdata.should eq "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b"
end

it "decompress" do
  data = "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b"
  decdata = decompress(data)
  decdata.should eq "this is a test string !!!!\n"
end

I think the problem was two fold.

First zlib compression probably produces raw bytes, while your specs are expecting hexstrings. I updated compress to convert the io into a slice, then return a hex string representation of the bytes.

On the decompression side, you can just call .hexbytes to get a Bytes representation of the hexstring to pass to the IO, versus your manual way of doing it.

Hi, @Blacksmoke16 . ask you a question please.

  reader = Compress::Zlib::Reader.new(io)
  str = String::Builder.build do |builder|
    IO.copy(reader, builder)
  end

I saw you have to use String::Builder.build to convert reader to a string, my question is, how to know where need to use String::Builder.build? is reader implemented some interface, which make use it possible?

thank you.

I didn’t pay too much attention to the code that was already there, but in practice you shouldn’t use String::Builder directly (as it points out in the API docs), but instead use String.build.

However, since Compress::Zlib::Reader is also an IO you could just do .gets_to_end and skip needing String.build at all. I.e. Compress::Zlib::Reader.new(IO::Memory.new data.hexbytes).gets_to_end.

1 Like

Appreciate the feedback and the explanation. I see these strings have an added 000000ffff that isn’t in the std lib spec for compress::Zlib. Does that matter?

Doesn’t seem to make a difference. I tired the hexstring in Python and it was able to decompress it just fine, both with and without the extra bytes:

$ echo "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b" | python -c "import zlib,sys;print(repr(zlib.decompress(bytes.fromhex(sys.stdin.read()))))"
b'this is a test string !!!!\n'

$ echo "789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b" | python -c "import zlib,sys;print(repr(zlib.decompress(bytes.fromhex(sys.stdin.read()))))"
b'this is a test string !!!!\n'

Maybe someone more familiar with the spec/implementation would be know what they’re from. My guess would be some sort of padding, but :person_shrugging:.

1 Like