Block capture in an ECR-like template engine results in out-of-order output

Context

In my armature shard, I include a templating engine which is mostly a copy/paste of ECR except it sanitizes the output for HTML by default:

<!-- This will be HTML-escaped -->
<%= content %>

<!-- This will be output raw -->
<%== content %>

I’m working on block-capture support like what the erubi gem uses (or used to use, apparently, until last month) for when you need to wrap the block’s content inside of something else. For example, a Form component could be used like this:

<%|== Form.new method: "POST", action: action do |f| %>
  <%== f.input "name" %>
<%| end %>

And the code could look something like this (note the to_s(io) method):

struct Form
  def initialize(@method : String, @action : String, &@block : self -> Nil)
  end

  def input(name)
    Input.new name
  end

  def to_s(io) : Nil
    io << "<form>"
    @block.call self
    io << "</form>"
  end

  record Input, name : String do
    def to_s(io) : Nil
      io << %{<input name="} << name << %{">}
    end
  end
end

Problem

Now, for this particular component I want to output raw HTML, but when testing with HTML sanitization enabled, the block output appears in the result ahead of the header/trailer content:

<%|= Form.new method: "POST", action: "/" do |f| %>
  <%== f.input "name" %>
<%| end %>

<!-- result — notice that the `input` tag comes before the `form` -->
  <input name="name">
&lt;form method=&quot;POST&quot; action=&quot;/&quot;&gt;&lt;/form&gt;

This doesn’t happen when I output the raw content, so I’ve narrowed it down to the fact that when I call .to_s on the object that sanitizes the object’s output for HTML, it’s passed a String::Builder but the block is piping its output directly to the original IO object.

So I’m trying to figure out how to make Armature::HTML::SanitizableValue pipe directly to the same IO while also HTML-escaping the object’s output, but I can’t figure out how to do that. Any ideas?

EDIT: Nice, i see this is from July so this question may be resolved already. I guess I need to check dates better.

Just sort of curious, but it looks like io is being passed to HTML.escape, so could you also pass it to to_s?

def to_s(io)
  {% if T < ::Armature::Template::HTML::SafeValue %}
    @value.to_s io
  {% else %}
    ::HTML.escape @value.to_s(io), io # pass the io to `to_s`?
  {% end %}
end

Haven’t used ECR in a bit, so I’m not sure how to run it. And this might be problematic anyway, now that I look at it more closely. :pensive:

Not completely resolved. I did add block capture to Armature using the <%|== foo do %> notation, but it’s not complete yet — I couldn’t figure out sanitization for it (<%|= foo do %>).

This was actually part of the reason for the out-of-order execution I was seeing (not quite exactly this code, but similar), so I totally understand why you’d think that. Unfortunately, there’s a semantic difference here. @value.to_s(io) doesn’t return a string value for the HTML.escape method to operate on. And if it did, it would have already written itself to the IO object by the time it returned.

With few exceptions, to_s and to_s(io) convention in Crystal revolves around the fact that to_s(io) has side effects — it outputs a string representation to the IO object, usually without holding that string representation in memory, and almost always returns nil. If you define this method, you get to_s (without the IO arg) for free.

1 Like

Gotcha. I’m a bit confused how it all fits together here, especially with ECR. I tried to run the specs in the armature shard, but they don’t compile on newest crystal.

From the code example given, trying to get at the spirit, I wonder since escaping html entities is done without any context needed, and IO is just a stream that defines how bytes are read and written, to just make an IO that escapes as it’s written.

I tweaked your example to make it a bit more workable for me

require "html"

class HTMLEscapeIO < IO
  def initialize(@output : IO = IO::Memory.new); end
  
  def write(slice : Bytes) : Nil
    HTML.escape(slice, @output)
  end
  
  def read(slice : Bytes) : Int32
    @output.read(slice)
  end
  
  def to_s(io)
    @output.to_s(io)
  end
end

struct Form
  def self.build(method, action, &)
    form = new(method, action)
    yield(form)
    form
  end
  
  def initialize(@method : String, @action : String)
    @fields = [] of Input
  end

  def input(name)
    @fields << Input.new(name)
  end

  def to_s(io) : Nil
    io << "<form>"
    @fields.each do |field|
      field.to_s(io)
    end
    io << "</form>"
  end

  record Input, name : String do
    def to_s(io) : Nil
      io << %{<input name="} << name << %{">}
    end
  end
end


form = Form.build("POST", "/something.cgi") do |form|
  form.input("hello")
end

escaped = HTMLEscapeIO.new
regular = IO::Memory.new

form.to_s(escaped)
form.to_s(regular)

puts regular.to_s # <form><input name="hello"></form>
puts escaped.to_s # &lt;form&gt;&lt;input name=&quot;hello&quot;&gt;&lt;/form&gt; 

Playground link: Carcin

I do wonder though if the issue is moreso with the block syntax, as I noticed the example doesn’t seem to capture any fields declared in the block.

WDYT?

I tried a few variations on this, too. :slightly_smiling_face: It runs into a problem where any content that isn’t implemented in terms of that form object is then sent to the IO object out of order.

If you have this template:

<div>
  <%|== Form.build("POST", "/something.cgi") do |form| %>
    STATIC CONTENT
    <%== form.input("hello") %>
    MORE STATIC CONTENT
  <% end %>
</div>

It compiles to roughly this code:

io = IO::Memory.new
io << "<div>"
Form.build("POST", "/something.cgi") do |form|
  io << "STATIC CONTENT"
  io << form.input("hello")
  io << "MORE STATIC CONTENT"
end.to_s io
io << "</div>"

This writes the STATIC CONTENT and MORE STATIC CONTENT text out of order before the form element.

Basically, calling form.input can’t be used to mutate the state of the form, but instead needs to instantiate an object to be serialized to the IO object directly because that’s what everything else in the block is doing: all static content as well as <%= and <%== interpolations.

I’m also not a huge fan of rendering to an in-memory buffer if it can be avoided. Armature (and ECR) write directly to an IO object without buffering. This is excellent for 2 reasons:

  1. It avoids spending CPU time on heap allocations, especially useful if you’re rendering layers and layers of components. Heap allocations while rendering HTML probably aren’t going to bottleneck your app but, all else being equal, not allocating on the heap is typically a whole lot faster than allocating on the heap.
  2. If you’re rendering a lot of data, you don’t have to hold the entire dataset in memory in any form (either model objects or the string to send over the wire). You can see an example in this tweet where I fetched a year’s worth of orders, serialized it into 286MB of JSON, and never consumed more than 3.1MB of RAM from start to finish. The reason that works is that to_json(io) doesn’t keep the string representation in memory. Armature follows the exact same idea for HTML templates for the same reasons. No matter how many records my query returns, only one of them needs to be held in RAM at a time, after which point it is serialized to the client and discarded.

A template that sanitizes values may not be able to eliminate heap allocations entirely, but most of mine do — this line will cause a heap allocation except when the value is already a string. If these two issues can be resolved, then heap allocations could be eliminated entirely in Armature::Template:

This is actually the reason I got involved in either of those two issues. :smile:

1 Like

Gotcha. Since ECR is compiled ahead of time, it becomes a bit more complicated.
The lexer also appears to need additional tokens in order track block state. (I see you worked a bit through that in armature)

It’s an awkward problem to consider, but I thought of an API that seems satisfactory to me, and that is when blocks are encountered in an ECR template, to have that method return a Proc(IO, Nil)

This frees up the method call to perform as normal, and it can yield whatever it wants, but yields the captured IO last.

Given a template like:

<div>
  <%| form("POST", "/something.cgi") do |form| %>
    <div>
      <%= f.input("hello") %>
    </div>
  <%/ end %>
</div>

it would map to a method like

class Builder  
  def input(name)
    "<input name='#{name}' />"
  end
end

def form(etc, name, &block : Proc(Builder, IO, Nil))
  ->(io : IO) do 
    io << "<form etc='#{etc}' name='#{name}'>"
    builder = Builder.new
    block.call(builder, io)
    io << "</form>"
  end
end

The processor appends a block parameter to the template expression, allowing a more natural processing of the stream

The generated code then looks something like

io = IO::Memory.new
io << "<div>\n  "
__proc__1 = form("POST", "/something.cgi") do |f , io|
io << "\n    <div>\n      "
f.input("hello").to_s io
io << "\n      "
__proc__1.call(io)
io << "\n</div>\n"

I went ahead and threw up an example here: ecr_with_blocks/test.cr at master · skinnyjames/ecr_with_blocks · GitHub

I added 2 token types to the lexer because I’m lazy, but maybe it can be done with fewer.

I’m also not a huge fan of rendering to an in-memory buffer if it can be avoided. Armature (and ECR) write directly to an IO object without buffering. This is excellent for 2 reasons:

This brings me back to my main idea I think. Since that HTMLEscapeIO is an IO itself, it’s just proxying the buffer to HTML.escape and shouldn’t be creating intermediate strings. See: ecr_with_blocks/test.cr at master · skinnyjames/ecr_with_blocks · GitHub for piping escaped html directly to a file handle

Please correct me if I’m wrong here.

Interesting stuff!