I’d need to be able to modify the structure as I go. I don’t think that is possible with`Reader`? But thanks for the suggestion, there may be use cases for just parsing too.
Ahh gotcha. might be able to pair it with a XML::Builder where you create a reader using the source IO then the builder with the dest IO, then as you read call related write methods, passing things thru directly, or if you need to make a change then do that instead of the existing node.
EDIT: Did something similar with XML::Reader/JSON::Builder and XML::Builder/JSON::PullParser for oq.
crystal-html5 also supports parse HTML stream at two different levels.
Token-level — no tree built, constant memory: Useful when one only need tokens without building the tree.
HTML5.each_token(io) do |token|
puts token.data if token.type.start_tag?
end
Also available as HTML5.token_iterator(io) if you want an Iterator.
SAX-style — event callbacks during tree construction: Useful when one need a tree-aware events
class MyHandler
include HTML5::StreamingHandler
def on_element_open(tag, attrs, namespace)
puts "<#{tag}>"
end
end
doc = HTML5.stream(io, MyHandler.new)