I’ve been wrestling with compressing and decompressing strings using Zlib. From the Zlib spec, I’ve written the following methods and tests:
require “spec”
require “compress/zlib”
def compress(data : String) : String
io = IO::Memory.new
writer = Compress::Zlib::Writer.new(io)
writer.flush
writer.print data
writer.close
return io.to_s
end
def decompress(data : String) : String
io = IO::Memory.new
data.scan(/…/).each do |match|
io.write_byte match[0].to_u8(16)
end
io.rewind
reader = Compress::Zlib::Reader.new(io)
str = String::Builder.build do |builder|
IO.copy(reader, builder)
end
return str
end
it “compression” do
data = “this is a test string !!!\n”
encdata = compress(data)
encdata.should eq “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
end
it “decompress” do
data = “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
decdata = decompress(data)
decdata.should eq “this is a test string !!!\n”
end
The decompress passes the test but the compress fails. The test returns:
Expected: “789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b”
got: “x\x9C\u0000\u0000\u0000\xFF\xFF+\xC9\xC8,V\u0000\xA2D\x85\x92\xD4\xE2\u0012\x85⒢̼t\u0005E \xE0\u0002\u0000\x85O\b{”
What do I need to do to this IO::Memory object to get the expected string?
For me, both of your specs were failing. However this seems to work:
require "spec"
require "compress/zlib"
def compress(data : String) : String
io = IO::Memory.new
writer = Compress::Zlib::Writer.new(io)
writer.flush
writer.print data
writer.close
return io.to_slice.hexstring
end
def decompress(data : String) : String
io = IO::Memory.new data.hexbytes
reader = Compress::Zlib::Reader.new(io)
str = String::Builder.build do |builder|
IO.copy(reader, builder)
end
return str
end
it "compression" do
data = "this is a test string !!!!\n"
encdata = compress(data)
encdata.should eq "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b"
end
it "decompress" do
data = "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b"
decdata = decompress(data)
decdata.should eq "this is a test string !!!!\n"
end
I think the problem was two fold.
First zlib compression probably produces raw bytes, while your specs are expecting hexstrings. I updated compress to convert the io into a slice, then return a hex string representation of the bytes.
On the decompression side, you can just call .hexbytes to get a Bytes representation of the hexstring to pass to the IO, versus your manual way of doing it.
reader = Compress::Zlib::Reader.new(io)
str = String::Builder.build do |builder|
IO.copy(reader, builder)
end
I saw you have to use String::Builder.build to convert reader to a string, my question is, how to know where need to use String::Builder.build? is reader implemented some interface, which make use it possible?
I didn’t pay too much attention to the code that was already there, but in practice you shouldn’t use String::Builder directly (as it points out in the API docs), but instead use String.build.
However, since Compress::Zlib::Reader is also an IO you could just do .gets_to_end and skip needing String.build at all. I.e. Compress::Zlib::Reader.new(IO::Memory.new data.hexbytes).gets_to_end.
Appreciate the feedback and the explanation. I see these strings have an added 000000ffff that isn’t in the std lib spec for compress::Zlib. Does that matter?
Doesn’t seem to make a difference. I tired the hexstring in Python and it was able to decompress it just fine, both with and without the extra bytes:
$ echo "789c000000ffff2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b" | python -c "import zlib,sys;print(repr(zlib.decompress(bytes.fromhex(sys.stdin.read()))))"
b'this is a test string !!!!\n'
$ echo "789c2bc9c82c5600a2448592d4e21285e292a2ccbc74054520e00200854f087b" | python -c "import zlib,sys;print(repr(zlib.decompress(bytes.fromhex(sys.stdin.read()))))"
b'this is a test string !!!!\n'
Maybe someone more familiar with the spec/implementation would be know what they’re from. My guess would be some sort of padding, but .