An idea for further optimizations: If you have long segments of characters that don’t get encoded (i.e. they’re copied 1:1 from the original string), it would be more efficient to copy them as a big batch instead of writing every single character.
For example, encoding the string abababababababab
(either as the entire input or as a segment between two curly encodings) could be a single memcpy of 16 bytes instead of 16 individual writes of one byte.
This would require going even further down, using Char::Reader
directly instead of String#each_char
so you can keep track of byte indices.
This mechanism is used in some places in stdlib, by the way, for example in HTML.escape