In my opinion, it’s too easy to have an Unhandled exception in spawn while the rest of the program keeps chugging along. (I ran into this with a long-lived process, with a cleanup Fiber that died due to an unanticipated IO::Error, resulting in consequences later due to that cleanup not running…) I’d generally like to have my program die if there’s an unanticipated and uncaught exception in any Fiber, not just the main one.
I’m proposing spawn! which looks like this:
def spawn!(*args, &block)
spawn do
begin
block.call
rescue ex
STDERR.print "Unhandled exception in spawn!: "
ex.inspect_with_backtrace(STDERR)
STDERR.print "spawn!: Fatal. Dying..."
STDERR.flush
exit(1)
end
end
end
Alternatively, the docs might need to more loudly declare that every use of spawn should probably have a catch-all rescue block.
The use case certainly demands attention. But I’m not conviced about the proposed solution. spawn! is not very clear about what it’s doing and this behaviour does not seem very intuitive.
Termination of the entire process is only one possible reaction to an exception in a fiber that’s fatal and unrecoverable. There may be other, less grave conclusions in scenarios where the failure only affects part of the program or may be recovered.
I’d see a possible solution for this problem with structured concurrency (cf. [RFC] Structured Concurrency · Issue #6468 · crystal-lang/crystal · GitHub) which would allow to configure error handling for a concurrency scope.
which includes the spawn! method and macro, as well as some tests. We’re using this in production.
I agree that it usually makes sense to still do some exception handling within the Fiber. In some cases, that may be to intentionally ignore a specific exception, or even all exceptions. The spawn! really just changes the default behavior for an unhandled exception to a default that I personally prefer.
I’d rather explicitly ignore/recover like this:
spawn!
loop do
begin
# your code here
rescue ex : Exception
Log.warn { "Ignoring exception and restarting this fiber..." }
end
end
end
rather than having to remember to explicitly write a catch-all rescue in every spawn.
My operational assumption is that there’s a much higher chance of me seeing the unhandled exception if the process dies than if it’s just in the middle of the logs somewhere.
I’m not very familiar with structured concurrency, but it does sound like it could also be useful as another way of defining exception handling at spawn time!
There’s still the normal spawn. I think spawn! makes good sense. Of course there can be other reactions, but I’d say that in the majority of cases the exception is an error they forgot to catch, and just letting the fiber die quietly is a bit of a footgun.
spawn! is not as fancy as structured concurrency, but it’s only 13 lines of code and I think it implements the behaviour a lot of folks would assume.
IMO, dying on uncaught exceptions in fibers ought to be the default, but that’s another discussion.
I think the latter sentence makes a good point. I agree it’s not great that unhandled exceptions are simply ignored if they happen in a fiber other than the main one.
But I’m not sure if adding yet another method to spawn fibers is a good idea, even if brings safer semantics. It might be helpful, but it also could lead to confusion about which method does what and which one you should chose.
Another option could be a setting for the existing spawn method, with global configuration for example. Then you don’t need to bother about two different spawn methods, you can just opt in that all fibers should abort the process on an unhandled exception.
Also I think we should talk a bit more about long-term evolution of the fiber spawn API, particularly in the context of structured concurrency, before introducing something new.
We are so many methods exists a bang! variant, don’t think add a clear spawn! method hurt anythings.
Why bother with a global option instead? add a spawn! is told the user, only exception raised in this fiber will halt the running process, not others. although, i think add a raise: false option as default is acceptable, but really introduce the complexity, a new spawn! is more clearly.
I think that’s worse, then a shard that uses spawn would have to consider both options of the setting. You could argue that it forces spawn using shards to properly handle exceptions and not rely on the quietly-dying behaviour, but it’s a bit of a roundabout way to do it.
Well, spawn is going to stick around for a while, even if we get better ways of handling it.
We can’t change spawn to do what I consider The Right Thing, but we can add spawn! and add a note to spawns documentation that “you probably want to use structured concurrency or at least spawn! instead of this”.
Adding a new method means we’ll have to support it ad vitam aeternam to not break backward compatibility.
As said above it’s “only 13 lines of code” then it’s easy to make your own tailored for your application or library. Put it inside a namespace to avoid conflicts.
This is also mostly a personal preference: rescuing in a loop? what if I don’t want the job to restart? what if I’d rather have some global or specific callback to be called?
We must revisit the whole concurrency issue, and at least the failure/cancellation in a RFC before settling on something for the stdlib that would be the best for everyone!
Well, in that case it is easy enough to just rescue the exception and do what you wish. One default doesn’t preclude overriding it with something else.
This however I agree with, even if I would prefer a more user friendly default behavior. spawn! may very well not be the API we want. But on the other hand I’d prefer to kill toplevel spawn as well :)
The implementation in the first post simply dies, which I think is what spawn ought to do in the first place.
Well, if we end up at that level of BC breaking change, I can’t see why we couldn’t take spawn! down too, and enjoy it in the meantime.
Well yeah, but we can’t fix spawn for the same BC reason, so.
Yeah, no biggie for a seasoned Crystal developer, but let’s think about the new users just learning the language and reading the official documentation and only later discovering that, whoops, uncaught exceptions in spawns gets quietly ignored… spawn! might be a bit of a kludge, but it’s the kind of wart you get in every mature language.