Compress::Gzip::Reader cannot open a file in BGZF format. Is this a bug?

I ran into this problem again today and was a bit annoyed, so I spent a few hours investigating. The bgzf format is frequently used for biological data, so I couldn’t afford to be interrupted every time I opened them.

Then I identified a very suspicious part.

According to the specification, the LEN of the EXTRA field is 2 bytes. However, only one byte is read here.

It is likely that little-endian is being used here. I think that further delayed the discovery of the problem.

This is the official slide from Samtools explaining the BGZF file. The LEN is | 0 | 6 |.
However, the actual file is | 6 | 0 |. I suspect the actual file is correct, but at least there is some confusion.

Perhaps I should create a pull request.

This may possibly be related to the previous issue.

1 Like