RFC: `with ... yield` replacement


#1

The with ... yield feature has some drawbacks that would be good to solve. To illustrate the main challenge, both from the user and from the compiler, let’s see the following code:

foo do
  bar
end

It is unclear (without checking the source of foo) where bar can be declared.

This level of uncertainty is an upper bound regarding how much code can be compiled/analyzed in a modular way eventually. Blocks are widely used in crystal code.

When the block is executed without a with ... yield the body is resolved in the same lexical scope where the block is written by the user. But today, unless the foo source is analyzed there is no clue how to solve the lookup.

So we are penalizing most of the blocks usages, generating uncertainty for the user and some extra work for the compiler.

The with ... yield was introduced to allow dsl as in Ruby. They are a powerful tool.

Some people don’t mind having an explicit context like:

App.routes do |c|
  c.get :foo
  c.post :bar
end

But some do believe that

App.routes do
  get :foo
  post :bar
end

looks better and is more powerful.

Sadly it has the drawbacks mentioned earlier.

The proposal I want to share is to remove the with ... yield but to introduce a new syntactic convention for method calls. A method call with a trailing & would have the semantic of evaluating the block body in the context yielded by the callee method.

Note: Choosing chars for new syntax is always problematic, the core of the proposal is a new syntax for a method call that won’t clash with the rest of the language. Which is the actual syntax for this is a different aspect of the actual mechanism.

From an implementation detail this could be done by a local transformation.

So

App.routes& do
  get :foo
  post :bar
end

could be translated to

App.routes do |c|
  c.get :foo
  c.post :bar
end

And be compiled as usual.

I do think that changing the current context / removing the explicit receiver is a key part of a good dsl.

Some additional considerations:

  • the user can decide whether to use implicit or explicit contexts for each invocation.
  • arguments work with the syntax.
html& do
  p& class: "foo", 2 do
    "Lorem ipsum"
  end
end
  • top level methods. Depending on the implementation they might not be callable, but I don’t think it’s terrible given the common usage. Ultimately a ::method could disambiguate.

  • additional arguments in the block. Initially I wouldn’t allow them. I haven’t seen DSLs with arguments in blocks unless that argument is the context.

  • do/end vs { ... }. Since the new syntax applies on the method name both block syntax works.

  • choosing another syntax/character. Instead of & I prefered :, but it clashes with named arguments too much (in the previous html example, parsing would require lookahead for p: class: "foo").

  • Using & as a suffix can be seen related to the &.proc notation also which refer to the first argument of the block.


#2

Edit №2, I’m changing my mind way too fast.

Consider this syntax:

App.routes.tap &.do
  get :foo
end

Which could be understood as:

call each line in the following block with routes. prepended to it

Which is essentially the same when doing simple foo.tap &.bar:

call bar with foo. prepended to it

Update: one more example:
Update: this example makes no sense, as the method is called to nowhere.

Nonsense code
["a", "b"].map &.do
  upcase
  * 2
end

Anyway, I still think foo.tap &.do is slightly better than foo& do. The & looks like a part of foo in the latter case.


#3

Also in your example

It’s still not clear whether is do "Lorem ipsum"; end is the third argument or the p."Lorem ipsum" call.


Also: Arguments could be long, and the ampersand in very beginning p& can be easily missed by a developer’s eye (the lack of the space between affects it as well).


Therefore, having the ampersand as much near to the do as possible could be better in the end:

html.tap &.do
  p(class: "foo", 2).tap &.do
    # "Lorem ipsum" # `p."Lorem ipsum"`? I think you meant the next line:
    text "Lorem ipsum"
  end
end

#4

None. The string is the last expression of the block passed to the p method.

Yes. Maybe &do or do& are valid alternatives.

html &do
  p class: "foo", 2 &do
    "Lorem ipsum"
  end
end

&.do wont work since it is (and keep been) a syntax sugar for { |x| x.do }.

Note: I don’t follow why you always use tap in your examples though. In a simple usage of an html DSL there should be no tap.


#5

I still don’t understand then. Could you please expand the code (presumably with with .. yield)?

I’m trying to preserve the existing syntax, because it’s good to stay consistent.

foo &.bar => foo { |x| x.bar }
foo &.do
  bar
  baz
end => foo { |x| x.bar; x.baz }

LGTM

That’s because I forgot that html and p methods can accept blocks. Sorry for that :sweat_smile:


#6

One problem I have with with ... yield is that it’s part of an object’s interface, but there’s no way to tell if it’s used or not as far as I can tell.

If I want to call the method foo, I have no idea if I need to use it like

foo do
  bar
end

or

foo do
  &.bar
end

#7

As far as I understand Ruby binds method calls at runtime (obviously) so that allows the notion of an abstract “receiver”, in fact instance.method syntax is just a shorthand for instance.__send__ :method.

Now, this will be challenging for Crystal, however shouldn’t it still be possible to decide the receiver at compile time? instance_eval was a bit misleading, because it takes text in Ruby, but instance_exec may just rebind self, won’t this work?


#8

The expression foo &.do already have a meaning and it is foo { |x| x.do }. That construct is already used. period.

The only way would be to do a look ahead in the parser to deal with &.do vs &.doit. It’s better to avoid those overlaps as much as possible.


#9

A problem with &do is that is that it would need also a matching syntax for { }, like &{ }. Changing the method name is one rule, vs two. Still &do &{ could work, just exposing why I was more tempted to do something in the method name,


#10

I don’t follow your example.

It’s true that a method changing the context is part of the method signature. Changing to a syntax sugar for this would eliminate that.


#11

I don’t think the implementation is much of a problem. The resolution will happen on compile time when the block is inlined the information of lexical scope plus method body will be enough to implement the lookup for methods/vars with whatever semantic is wanted.

The important thing is that in things like

class Foo
  def m
    some_method do
      bar
    end
  end

  def bar
    ...
  end
end

One (and the compiler) will be able to assume that bar is in Foo#bar.

If the some_method would be called with the with ... yield replacement then , in that case, the resolution of bar will need to contemplate the type of implicit block argument.


#12

I apologize, but I don’t think I follow. I would like to submit another example.

class Worker
  DO_NOTHING = Proc.new { |data| data }
  def initialize data
    @data = data
    @bw = DO_NOTHING
    @aw = DO_NOTHING
  end
  def do_work
    @data = @bw.call @data
    @data = yield @data if block_given?
    @data = @aw.call @data
    self
  end
  protected def before_work █ @bw = block end
  protected def after_work █ @aw = block end
end

def preprocess data; data.sub 'x', '[naked preprocess]' end
class Worker
  def preprocess data; data.sub 'x', '[worker preprocess]' end
  def postprocess data; data.sub 'x', '[worker postprocess]' end
end
def postprocess data; data.sub 'x', '[naked postprocess]' end

w = Worker.new 'xxxxx'
w.instance_exec do
  before_work { |data| preprocess data }
  after_work { |data| postprocess data }
end
puts w.do_work { |data| data.sub 'x', 'z' }.instance_variable_get :@data

This is valid Ruby that produces [worker preprocess]z[worker postprocess]xx

If we comment out Worker#postprocess, we instead get [worker preprocess]z[naked postprocess]xx

I have to say this is a bit surprising, because if postprocess is not defined at all, then the exception says undefined method 'postprocess' for #<Worker:0x1bdf564> (NoMethodError), which suggests that it only searches Worker instance, but in fact it also considers other namespaces.

What do you think, is it realistic to achieve in Crystal?


#13

In Ruby when you define a top level method it actually gets defined in Object as a private method. And all objects inherit Object. That’s why the top level method is found when you comment out the instance method.

In Crystal it’s different, but we still search the top level if we can’t find an instance method and there’s no explicit receiver.


#14

Right, that makes sense. So why not just add instance_exec and maybe class_exec then instead of with self yield?


#15

Because it can’t be implemented in pure Crystal. It has to be magic, and for magic we mostly use keywords and constructs.


#16

Yes, it was an argument for a change.

What I meant was that it’s a sneaky and error-prone part of the method signature IMO. When I look at the documentation for a method, I can easily see its parameters, return value etc. but I have no easy way of telling if a &block yields with a scope or not.


#17

Now that I understand the issue let me make a suggestion. I use Ruby-based DSLs all the time, both provided by gems like Rails and made ad-hoc mostly for a) configuration (I strongly believe that config is code) and b) to better understand the problem domain.

Quoting from the original post:

App.routes& do
  get :foo
  post :bar
end

html& do
  p& class: "foo", 2 do
    "Lorem ipsum"
  end
end

I think both examples here need more context, because knowing nothing about the implementation, just by reading these DSLs I am a bit confused:

App.routes is a method that evaluates its block in context of App, right? If this is the only method in App that does that, why is it calles App.routes? If there is another such method, say App.tls, what’s there to stop users from calling get from tls and say cert_file from routes? Surely a better pattern would be to delegate routing information to App::Router and crypto parameters to App::TLS that will only contain methods relevant to their fields? So, let’s do that:

App.routes do
  get :foo
  post :bar
end

class App
  @@router = Router.new
  def self.routes(&block)
    @@router.configure& &block  # what exactly are we doing here?
  end
end

You see, in Ruby there is a method to run a block with instance as a receiver: instance_exec, so we’ll use that: @@router.instance_exec &block

Now lets consider the second example, some DSL to build an HTML snippet. Leaving aside p& that has no method calls, I’m also a bit confused as to what’s going on there. html must be a method call that yields its block, right? In ruby I would do that like this:

class HTML
  def initialize *args, &block
    @root_block = Block.new *args, &block
  end
end

class Block
  def initialize *args, &block
    instance_exec &block
  end
  def html *args, &block
    @child_nodes.push Block::HTML.new *args, &block
  end
  def p *args, &block
    @child_nodes.push Block::P.new *args, &block
  end
end

class Block::HTML < Block; end
class Block::P < Block; end

Obviously this is more of a pseudocode than real implementation just to show the pattern.

Now, going back to the top:

It is unclear (without checking the source of foo) where bar can be declared.

My problem with with .. yield is that it can only be used for yield, but not for Proc.call. Your problem is that the receiver is not apparent for the user, right? Well, the same is true for the explicit form also:

App.routes do |c|
  c.get :foo
  c.post :bar
end

How do you know who is c without “checking the source of foo”?

class Routes
  def get(s); p routes_get: s end
  def post(s); p routes_post: s end
end

class App
  @@routes = Routes.new
  def self.get(s); p app_get: s end
  def self.post(s); p app_post: s end
  def self.routes1; yield self end
  def self.routes2; yield @@routes end
  def self.go_figure(&block : App.class | Routes ->)
    if Random.rand(2).even?
      routes1 &block
    else
      routes2 &block
    end
  end
end

App.routes1 do |c|
  c.get :foo
  c.post :bar
end

App.routes2 do |c|
  c.get :foo
  c.post :bar
end

App.go_figure do |c|
  c.get :foo
  c.post :bar
end
$ ./test.cr
{app_get: :foo}
{app_post: :bar}
{routes_get: :foo}
{routes_post: :bar}
{routes_get: :foo}
{routes_post: :bar}

$ ./test.cr
{app_get: :foo}
{app_post: :bar}
{routes_get: :foo}
{routes_post: :bar}
{app_get: :foo}
{app_post: :bar}

#18

I think the question isn’t what type gets the method. It’s who gets the method. When you see foo(), without an IDE to help you, you understand that you have to search foo in the current scope, ancestors and finally the top level. Oh, but maybe you are in the context of a with...yield? And how would you know that? You have to go and check at the source code of the surrounding methods. This is the “bad” that thing that’s being discussed: you have to do a lot of guess work to understand what’s going on.

That said, I don’t personally think this is a problem.


#19

Yes, I understand that. In my opinion the point of a DSL isn’t to dig into its implementaton, it’s the ease of use of a defined API. Of course if something goes wrong it’s helpful to be able to understand what’s going on under the hood, but if you’re already there with yield doesn’t matter much at this point.


#20

I just realised what I was getting at yesterday, so I’ll mention a different option to remove the unclarity of with ... yield: making it explicit in the method signature.

Kotlin, for example, does this with the same use case (mainly DSLs). When declaring the type of a closure, you can specify a class for the receiver, like this:

fun doIntegerOp(integer: Int, operation: Int.() -> Int): Int = integer.operation()

val three = doIntegerOp(2){ this + 1 }

The syntax on the caller side is exactly the same as for blocks without a receiver. But you can see if there’s a receiver and which type it has in the method definition. (which the IDE shows while typing)

I’m not sure which approach I prefer, to be honest. I like how this proposal reduces language complexity by completely getting rid of this construct, on the other hand the Kotlin approach retains the arguably cleaner looking DSL syntax while not having the unclarity of the current with ... yield.