What is the benefit of inlining methods with `yield`?

I’m trying to understand the benefits of the method inlining that comes with using a method that yields. Is it compile-time performance? Runtime performance? A secret third thing?

The reason I’m thinking about this is that I keep running into problems where a method that yields cannot be recursive (it would inline infinitely) and captured blocks can’t have code that yields inside of them. I understand this pretty well, but I don’t understand why the distinction exists in the first place. I assumed it was for runtime performance, but LLVM already does automatic inlining, so maybe there are aspects I’m missing.

@asterite should have the actual answer.

I think it’s not much a benefit but a requirement to implement blocks in Crystal, especially with the next and break statements.

2 Likes

IIRC the motivation was partially runtime performance to avoid the cost of calling a method when using methods like #each that mimics iteration. So yes, inlining. This also allows to keep direct access of variables of the caller in the stack as opposed to have a reference to them in the closure block.

But also, when the block has no type restriction it was convenient for the compiler implementation to have it this way. Maybe this is more historical, not completely sure. But at the very beginning I think we didn’t even have &block syntax and it was only the yield that marked the method. As such there was no type restriction.

Although all methods are like “C++ templates”, the yielding are even more because of this. That was my lame mental model at the very beginning.

One drawback of this approach is that we don’t reutilize typing and codegen of the yielding methods.

1 Like

I guess that means it’s even adverse to compile-time performance :person_shrugging:

Oh yeah. next and break are definitely a constraint.

Do next and break not work with captured blocks? I thought they did but I don’t use them enough to remember off the top of my head.

Next yes. Break and return no.

https://crystal-lang.org/reference/1.13/syntax_and_semantics/capturing_blocks.html#break-and-next

1 Like

Fascinating, I’ve somehow never noticed (or maybe forgot about?) this distinction.

I think something just clicked for me that hadn’t before. It seems break is really about escaping the underlying while loop (or similar construct) once all the layers of blocks are unrolled?

I actually had this as a followup question in mind in posting this thread. Basically, whether it could speed up compilation times if we replaced yield entirely with captured blocks so we wouldn’t have to recompile the each method every time an Enumerable method was called. But break and return are too valuable to give up in blocks, so if that can’t be implemented without yield it’s a moot point.

If we would compile using some kind of CPS (Continuation-passing style - Wikipedia) I think we could compile once and reuse them. But the cost is that the runtime will be heavier probably. I don’t think llvm ir supports cps directly so we will need to represent them. Maybe MLIR have some support for them but that would be an entire different backend for the compiler.

Hum, naive thinking here, but… could blocks have some special signature? For example pass the control flow through the return type (and pass the return value as an out arg). Like success=0, next=-1, break=-2, return=-3, …

(Foo) -> Bar
fun (Bar*, Foo) : Int32

Then yield becomes a case and the state should bubble up until it’s meaningful and handled? Well, that’s probably where the naive thinking gets kicked by reality :sweat_smile:

Note: compiling with optimizations should still inline the blocks.

1 Like

The main thing about yield is that doing return inside it returns from the outer method. That’s not the case with closures or lambdas in general. In Ruby (and our interpreter too, if I remember correctly) that’s implemente with some thing similar to exceptions, but that’s slower than inlining.

1 Like

I’ve wondered in the past about merging procs and blocks by having blocks return a union type (T | Break | Next | Return), and have the compiler build the boilerplate code to handle it. IIRC it gets complicated quite easily, and certainly wouldn’t be as performant, but it would be nice from the perspective of semantics “a block is just a proc with control operators”.