I ran into this problem again today and was a bit annoyed, so I spent a few hours investigating. The bgzf format is frequently used for biological data, so I couldn’t afford to be interrupted every time I opened them.
Then I identified a very suspicious part.
According to the specification, the LEN of the EXTRA field is 2 bytes. However, only one byte is read here.
It is likely that little-endian is being used here. I think that further delayed the discovery of the problem.
This is the official slide from Samtools explaining the BGZF file. The LEN is | 0 | 6 |.
However, the actual file is | 6 | 0 |. I suspect the actual file is correct, but at least there is some confusion.
Perhaps I should create a pull request.
This may possibly be related to the previous issue.