Error when read extra field gzip compressed data

hello everyone, when I read a gz file, I got error:

$echo -n 'H4sIBAAAAAAA/wYAQkMCADIAS0ksTuNKSyxO4UqtSC4pSuQqLilNS+MCAI56o3cXAAAAH4sIBAAAAAAA/wYAQkMCABsAAwAAAAAAAAAAAA==' | base64 -d > gggg2.gz

$ zcat gggg2.gz
dasf
fasd
exctra
stuff

$file gggg2.gz
gggg2.gz: gzip compressed data, extra field

$gzip -dc gggg2.gz |gzip >gggg3.gz

$file gggg3.gz
gggg3.gz: gzip compressed data, last modified: Tue Mar 24 02:15:48 2020, from Unix

and my demo code demo.cr is this:

require "gzip"


file=ARGV[0]
puts "input is #{file}"
Gzip::Reader.open(file) do |gfile|
	gfile.each_line do |line|
		puts line
	end
end

then test the demo code and data by this:

$crystal build demo.cr

$./demo gggg3.gz
input is gggg3.gz
dasf
fasd
exctra
stuff

$./demo gggg2.gz 
input is gggg2.gz
Unhandled exception: flate: invalid stored block lengths (Flate::Error)
  from /usr/share/crystal/src/flate/reader.cr:112:13 in 'unbuffered_read'
  from /usr/share/crystal/src/io/buffered.cr:79:16 in 'read'
  from /usr/share/crystal/src/gzip/reader.cr:90:7 in 'unbuffered_read'
  from /usr/share/crystal/src/io/buffered.cr:214:12 in 'fill_buffer'
  from /usr/share/crystal/src/io/buffered.cr:102:7 in 'peek'
  from /usr/share/crystal/src/io.cr:632:37 in 'gets'
  from /usr/share/crystal/src/io.cr:605:5 in 'gets'
  from /usr/share/crystal/src/io.cr:575:5 in 'gets'
  from /usr/share/crystal/src/io.cr:574:3 in 'gets'
  from /usr/share/crystal/src/io.cr:918:18 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:92:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:115:3 in 'main'
  from __libc_start_main
  from _start
  from ???


$cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.4 LTS"

$crystal --version
Crystal 0.33.0 [612825a53] (2020-02-14)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

Can you share the gz file?

More importantly, the Crystal code which causes this error.

And please try decompressing the file with gunzip to make sure it’s not simply a corrupted file.

1 Like

sounds like you file is broken.

I check the integrality by this:

$gzip -vt  xx.gz
gzip: xx.gz: extra field of 6 bytes ignored
...
gzip: xx.gz: extra field of 6 bytes ignored
 OK

@asterite sorry, I can’t share all my code, the part of code is this:

 Gzip::Reader.open(file) do |gfile|
         gfile.each_line do |line|
               i +=1
               if i%4 == 1 || i%4 == 2
                       yield line
                end
         end
 end

when I use the regular gz file yy.gz which is from xx.gz, the error is gone! So I guess maybe crystal not identity the unusual header of xx.gz.

$ gzip -dc xx.gz|gzip >yy.gz
$ file yy.gz
yy.gz:      gzip compressed data, last modified: Thu Mar 21 05:30:03 2020, from Unix

sorry, I can’t share the gz file. but I google and got this:

Gzip files can contains an extra field and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.

from https://bugs.python.org/issue17681

1 Like

Find a sample out in the wild? :)

1 Like

@asterite @straight-shoota @rogerdpack thanks~
Finally I google about extra filed and make demo data by this:

$echo -n 'H4sIBAAAAAAA/wYAQkMCADIAS0ksTuNKSyxO4UqtSC4pSuQqLilNS+MCAI56o3cXAAAAH4sIBAAAAAAA/wYAQkMCABsAAwAAAAAAAAAAAA==' | base64 -d > gggg2.gz

$ zcat gggg2.gz
dasf
fasd
exctra
stuff

$file gggg2.gz
gggg2.gz: gzip compressed data, extra field

$gzip -dc gggg2.gz |gzip >gggg3.gz

$file gggg3.gz
gggg3.gz: gzip compressed data, last modified: Tue Mar 24 02:15:48 2020, from Unix

and my demo code demo.cr is this:

require "gzip"


file=ARGV[0]
puts "input is #{file}"
Gzip::Reader.open(file) do |gfile|
	gfile.each_line do |line|
		puts line
	end
end

then test the demo code and data by this:

$crystal build demo.cr

$./demo gggg3.gz
input is gggg3.gz
dasf
fasd
exctra
stuff

$./demo gggg2.gz 
input is gggg2.gz
Unhandled exception: flate: invalid stored block lengths (Flate::Error)
  from /usr/share/crystal/src/flate/reader.cr:112:13 in 'unbuffered_read'
  from /usr/share/crystal/src/io/buffered.cr:79:16 in 'read'
  from /usr/share/crystal/src/gzip/reader.cr:90:7 in 'unbuffered_read'
  from /usr/share/crystal/src/io/buffered.cr:214:12 in 'fill_buffer'
  from /usr/share/crystal/src/io/buffered.cr:102:7 in 'peek'
  from /usr/share/crystal/src/io.cr:632:37 in 'gets'
  from /usr/share/crystal/src/io.cr:605:5 in 'gets'
  from /usr/share/crystal/src/io.cr:575:5 in 'gets'
  from /usr/share/crystal/src/io.cr:574:3 in 'gets'
  from /usr/share/crystal/src/io.cr:918:18 in '__crystal_main'
  from /usr/share/crystal/src/crystal/main.cr:106:5 in 'main_user_code'
  from /usr/share/crystal/src/crystal/main.cr:92:7 in 'main'
  from /usr/share/crystal/src/crystal/main.cr:115:3 in 'main'
  from __libc_start_main
  from _start
  from ???

1 Like

Nice, maybe file an issue for it (PR that fixes it also welcome :) .

ok,I submit a issue at https://github.com/crystal-lang/crystal/issues/8933