Compress::Gzip::Reader cannot open a file in BGZF format. Is this a bug?

BGZF stands for Blocked GNU Zip Format. It is a file compression format that is commonly used in genomics-related file formats. This BGZF file is a series of compressed blocks. By using a pre-created index, only the required range can be decompressed. This allows random access to large compressed files.

It is generated with the command bgzip, but can be decompressed with zcat or gzip -d.

I thought BGZF was almost fully compatible with Gzip. However, I found that Crystal’s Compress::Gzip::Reader cannot read it. I wasn’t sure if it was just expected behavior or if it was a bug. So I’m reporting it here instead of GitHub issue.

Thank you.

1 Like

Steps to Reproduce

  1. Install tabix:

    sudo apt install tabix
    
    dpkg -L tabix | grep bgzip
    # /usr/bin/bgzip
    # /usr/share/man/man1/bgzip.1.gz
    
  2. Compress a file with bgzip:

    bgzip -k your.txt
    
  3. Decompress and view the file:

    zcat your.txt.gz
    
  4. Read compressed file in Crystal:

    require "compress/gzip"
    
    string = File.open("your.txt.gz") do |file|
      Compress::Gzip::Reader.open(file) do |gzip|
        gzip.gets_to_end
      end
    end
    
Unhandled exception: deflate: invalid stored block lengths (Compress::Deflate::Error)

I think it’s “just” a missing feature, not a bug.

Of course, missing a feature can in some way be seen as a bug because it means the implementation is incomplete. Depends on how far you want to stretch the definitions…
It would be a bug if Compress::Gzip claimed to support everything gzip does.

Anyway, contributions are welcome :laughing:

1 Like

Since the standard library needs to work reliably and be persistently maintained, it may be appropriate for the BGZF to be supported by a third party library.
In any case, thank you.