Proposal: add spawn! to exit(1) on any unhandled exception in spawned Fiber

compumike · November 25, 2024, 5:03am

In my opinion, it’s too easy to have an Unhandled exception in spawn while the rest of the program keeps chugging along. (I ran into this with a long-lived process, with a cleanup Fiber that died due to an unanticipated IO::Error, resulting in consequences later due to that cleanup not running…) I’d generally like to have my program die if there’s an unanticipated and uncaught exception in any Fiber, not just the main one.

I’m proposing spawn! which looks like this:

def spawn!(*args, &block)
  spawn do
    begin
      block.call
    rescue ex
      STDERR.print "Unhandled exception in spawn!: "
      ex.inspect_with_backtrace(STDERR)
      STDERR.print "spawn!: Fatal. Dying..."
      STDERR.flush
      exit(1)
    end
  end
end

Alternatively, the docs might need to more loudly declare that every use of spawn should probably have a catch-all rescue block.

straight-shoota · November 25, 2024, 12:31pm

The use case certainly demands attention. But I’m not conviced about the proposed solution. spawn! is not very clear about what it’s doing and this behaviour does not seem very intuitive.
Termination of the entire process is only one possible reaction to an exception in a fiber that’s fatal and unrecoverable. There may be other, less grave conclusions in scenarios where the failure only affects part of the program or may be recovered.
I’d see a possible solution for this problem with structured concurrency (cf. [RFC] Structured Concurrency · Issue #6468 · crystal-lang/crystal · GitHub) which would allow to configure error handling for a concurrency scope.

zw963 · November 25, 2024, 1:40pm

The spawn! name and usage is clearly and useful, why introduce more complexity for such a simple proposal?

compumike · December 24, 2024, 6:47pm

I’ve just published the spawn-bang shard at:

which includes the spawn! method and macro, as well as some tests. We’re using this in production.

I agree that it usually makes sense to still do some exception handling within the Fiber. In some cases, that may be to intentionally ignore a specific exception, or even all exceptions. The spawn! really just changes the default behavior for an unhandled exception to a default that I personally prefer.

I’d rather explicitly ignore/recover like this:

spawn!
  loop do
    begin
      # your code here
    rescue ex : Exception
      Log.warn { "Ignoring exception and restarting this fiber..." }
    end
  end
end

rather than having to remember to explicitly write a catch-all rescue in every spawn.

My operational assumption is that there’s a much higher chance of me seeing the unhandled exception if the process dies than if it’s just in the middle of the logs somewhere.

I’m not very familiar with structured concurrency, but it does sound like it could also be useful as another way of defining exception handling at spawn time!

Xen · December 26, 2024, 10:57pm

There’s still the normal spawn. I think spawn! makes good sense. Of course there can be other reactions, but I’d say that in the majority of cases the exception is an error they forgot to catch, and just letting the fiber die quietly is a bit of a footgun.

spawn! is not as fancy as structured concurrency, but it’s only 13 lines of code and I think it implements the behaviour a lot of folks would assume.

IMO, dying on uncaught exceptions in fibers ought to be the default, but that’s another discussion.

zw963 · December 27, 2024, 3:31am

straight-shoota · December 27, 2024, 2:52pm

I think the latter sentence makes a good point. I agree it’s not great that unhandled exceptions are simply ignored if they happen in a fiber other than the main one.

But I’m not sure if adding yet another method to spawn fibers is a good idea, even if brings safer semantics. It might be helpful, but it also could lead to confusion about which method does what and which one you should chose.

Another option could be a setting for the existing spawn method, with global configuration for example. Then you don’t need to bother about two different spawn methods, you can just opt in that all fibers should abort the process on an unhandled exception.

Also I think we should talk a bit more about long-term evolution of the fiber spawn API, particularly in the context of structured concurrency, before introducing something new.

zw963 · December 27, 2024, 3:50pm

We are so many methods exists a bang! variant, don’t think add a clear spawn! method hurt anythings.

Why bother with a global option instead? add a spawn! is told the user, only exception raised in this fiber will halt the running process, not others. although, i think add a raise: false option as default is acceptable, but really introduce the complexity, a new spawn! is more clearly.

Xen · December 27, 2024, 5:12pm

I think that’s worse, then a shard that uses spawn would have to consider both options of the setting. You could argue that it forces spawn using shards to properly handle exceptions and not rely on the quietly-dying behaviour, but it’s a bit of a roundabout way to do it.

Well, spawn is going to stick around for a while, even if we get better ways of handling it.

We can’t change spawn to do what I consider The Right Thing, but we can add spawn! and add a note to spawns documentation that “you probably want to use structured concurrency or at least spawn! instead of this”.

ysbaddaden · December 29, 2024, 1:39pm

My 2 cents:

Adding a new method means we’ll have to support it ad vitam aeternam to not break backward compatibility.

As said above it’s “only 13 lines of code” then it’s easy to make your own tailored for your application or library. Put it inside a namespace to avoid conflicts.

This is also mostly a personal preference: rescuing in a loop? what if I don’t want the job to restart? what if I’d rather have some global or specific callback to be called?

We must revisit the whole concurrency issue, and at least the failure/cancellation in a RFC before settling on something for the stdlib that would be the best for everyone!

yxhuvud · December 29, 2024, 3:35pm

Well, in that case it is easy enough to just rescue the exception and do what you wish. One default doesn’t preclude overriding it with something else.

This however I agree with, even if I would prefer a more user friendly default behavior. spawn! may very well not be the API we want. But on the other hand I’d prefer to kill toplevel spawn as well :)

Xen · December 29, 2024, 7:20pm

The implementation in the first post simply dies, which I think is what spawn ought to do in the first place.

Well, if we end up at that level of BC breaking change, I can’t see why we couldn’t take spawn! down too, and enjoy it in the meantime.

Well yeah, but we can’t fix spawn for the same BC reason, so.

Yeah, no biggie for a seasoned Crystal developer, but let’s think about the new users just learning the language and reading the official documentation and only later discovering that, whoops, uncaught exceptions in spawns gets quietly ignored… spawn! might be a bit of a kludge, but it’s the kind of wart you get in every mature language.

Topic		Replies	Views
Exception inside a spawn block should crash the program Help & Support	7	413	October 1, 2021
Spawn vs Fiber.new Help & Support	2	457	July 3, 2021
First steps with spawn \| -Dpreview_mt behaviour Help & Support	21	290	February 6, 2024
Use ensure clause in spawn block, can't guarantee code in ensure always be execute, is expected behavior? Help & Support	6	312	November 12, 2022
Multi-thread program: Unhandled exception: Cannot allocate new fiber stack: Cannot allocate memory (Errno) Help & Support	21	1843	March 8, 2021

Proposal: add spawn! to exit(1) on any unhandled exception in spawned Fiber

Related topics