BGZF stands for Blocked GNU Zip Format. It is a file compression format that is commonly used in genomics-related file formats. This BGZF file is a series of compressed blocks. By using a pre-created index, only the required range can be decompressed. This allows random access to large compressed files.
It is generated with the command
bgzip, but can be decompressed with
I thought BGZF was almost fully compatible with Gzip. However, I found that Crystal’s
Compress::Gzip::Reader cannot read it. I wasn’t sure if it was just expected behavior or if it was a bug. So I’m reporting it here instead of GitHub issue.
Steps to Reproduce
sudo apt install tabix
dpkg -L tabix | grep bgzip
Compress a file with
bgzip -k your.txt
Decompress and view the file:
Read compressed file in Crystal:
string = File.open("your.txt.gz") do |file|
Compress::Gzip::Reader.open(file) do |gzip|
Unhandled exception: deflate: invalid stored block lengths (Compress::Deflate::Error)
I think it’s “just” a missing feature, not a bug.
Of course, missing a feature can in some way be seen as a bug because it means the implementation is incomplete. Depends on how far you want to stretch the definitions…
It would be a bug if
Compress::Gzip claimed to support everything
Anyway, contributions are welcome
Since the standard library needs to work reliably and be persistently maintained, it may be appropriate for the BGZF to be supported by a third party library.
In any case, thank you.