Safety guarantees of stdlib data structures

straight-shoota · October 30, 2024, 6:09pm

I have been looking through stdlib to gauge thread-safety issues on the route to multi-threading support.

Without much suprise, most basic data structures like Array are not thread safe. We’re aware of that.

A remarkable insight however is that if a component is not thread safe, it has a good chance to not even be concurrency safe.^[1]
This is an implicit consequence if a fiber swap could somehow happen inside a section of code that expects continuity.
An obvious cause is direct involvement of the event loop, such as IO operations (for example: IO::Buffered).
However, any yielding method can potentially introduce a fiber swap. There’s no way to know or restrict what code executes in the block. It might inject user code that leads to a fiber swap right within a vulnerable section of code.

This applies to all kinds of things in stdlib, in particular collection data structures like Array or Deque. They can break in unexpected ways with perfectly normal code, in a single-threaded environment.
I’ve put some demonstration examples in Concurrency-safety in Crystal stdlib · GitHub

These data structures are not protected against modifications while being iterated. Modification can happen inside the iteration in the same fiber, which might be easy to spot, but not necessarily if the code is spread out.
It’s even more insidious when different fibers are involved.

This is an issue even in single-threaded environments, and technically already without any concurrency. So this aspect should be discussed separately, but in conjuction with efforts for thread-safety.

I suppose it should be possible to introduce more safety guarantees. Maybe it’s not always really clear what’s the expected outcome. But in the worst case, raising on concurrent access would be preferable to inventing invalid data.
The scope is unfortunately pretty wide, involving lots of individual issues.
Some may be relatively easy to improve, others might require a lot of work and/or introduce other problems such as sever performance degradation. So it might be about weighing the options.
This thread is supposed to start a general discussion about this problem, while we can address individual aspects in more detailled issue threads.

Concurrency safety refers to multiple fibers, but still in the same thread. So no parallel access, only sequential intermixing of parts from different fibers at preemption points. ↩︎

rob · November 1, 2024, 8:59am

Interesting examples! Does the “made up data” always end up being zero or is that a consequence of the heap being essentially blank in these demonstrations?

Every language I’ve ever used came with a caution about iterating an array: be careful modifying it while you iterate.

Sometimes the behavior is undetermined! In C, etc, it’s not uncommon to see hacks which push stuff into an array in the middle of a while loop. For some reason especially in embedded code I’m used to seeing this kind of thing. It’s a clever algorithmic shortcut (clever here being a negative attribute).

I don’t think the goal of a concurrent stdlib needs to be “no footguns,” but I do think the paradigms need to be consistent enough and the footguns need to be well understood in the literature.

If plucking something out of an array while it’s being each’d results in a weird zero, that’s mostly fine with me. If it pulls data from memory or the heap somehow, then it seems like a roadmap to a zero-day exploit someday.

straight-shoota · November 1, 2024, 9:39am

In this particular example it’s zeroed memory. But since this can read outside the allocated buffer it can return arbitrary data or segfault.

straight-shoota · November 1, 2024, 9:45am

Yeah, I agree. We cannot bubble wrap everything. That would be impractical.

And I think it’s totally acceptable that mutating an array why iterating it can have unexpected results. But I believe it should not be able to crash the program via invalid memory access. At least not in a single-threaded environment. With multi-threading it would be much harder to avoid race conditions from parallel access, so I’m not sure any such guarantees would be practical there.

rob · November 1, 2024, 1:40pm

Has there been a survey of the comparative landscape?

I know a lot of our stdlib was inspired by or even copied from Ruby at one point. I also know that a lot of the Ruby stdlib is implemented in C, which may be where they dealt with it. What does Ruby provide here?

What does a survey of other modern languages show here? Go, Elixir, and (kind of) JavaScript emphasize concurrent programming. Swift emphasizes type strictness(?)

I’m only peripherally familiar with everything but JavaScript, and I don’t think I could communicate what happens even there except to say, “probably don’t do that”.

straight-shoota · November 1, 2024, 6:44pm

Yes, this would certainly be helpful. Hasn’t happend yet, though.
Maybe we can find some existing material on this…

straight-shoota · November 1, 2024, 6:46pm

This isn’t entirely about concurrency, but also non-concurrent footguns.
But concurrency is still a main factor I believe.

And unfortunately the story in CRuby (the reference implementation) is not great (or maybe it is because it avoids many issues). The Global VM Lock limits execution of VM code to one active thread at a time (other threads may do other work such as IO). So there’s no real parallelism, just concurrency similar to single-threaded Crystal.

Parallelism is not at the core of Ruby, and even though implementations like JRuby have some better capabilities (being based on the JVM), it’s still not really represented in the standard library. However, being run in a VM the program is less likely to crash due to concurrency issues, raising an exception instead.

This document has some details:

I think JavaScript is similar in that at least in the browser it typically runs only a single thread.

HertzDevil · November 1, 2024, 6:57pm

It’s one Ractor. Threads within a Ractor share a global VM lock, multiple Ractors can execute in parallel. Ractor.new(&) also errors if any variable inside the block forms a closure, apparently to prevent some forms of unsafe concurrent access. Instead data sharing is done by #send and #take.

All this is in addition to Ruby Fibers, however the prelude doesn’t ship any Fiber::Scheduler so I’m not sure if they could interact with Ractors in some non-trivial manner (e.g. transferring fibers between Ractors).

The ability to forbid a closure might be interesting to have. The compiler already handles this for Procs across the extern boundary.

yxhuvud · November 2, 2024, 8:32am

I’d rather have a crash, than a dirty read like the ones reading 0 in your examples. It is much, much better that it fails than continue on with memory corruption. Crashes are good because they are typically easy to locate. Memory corruption is a lot harder to track…

I wish there were some way to forbid access of variables defined in a different fiber/thread unless explicitly whitelisted to allow that. But that sounds nontrivial to build and borrow checker territory. Though it seems that (in addition to rust, obviously) C# has something alike that: A comparison of Rust’s borrow checker to the one in C# | em-tg . So perhaps that could be something to be inspired of.

But back to the examples. I’m ok with basic datastructures like array and deque not being safe in this respect (though perhaps we need variants that are safer), but the buffered IO example (which can probably be expanded with variants actually emitting writes to disk) looks a lot more counter to expectations. That I’d expect to be safe. I’d rather not have buffered IO at all than have concurrency unsafety around them.

skinnyjames · November 2, 2024, 10:21am

Just for clarification, using channels (order of operations?) to communicate between fibers still works, right?

This works they way I’d expect when moving data between fibers

ary = [1, 2, 3]
ch = Channel(Nil).new
ch2 = Channel(Int32).new

spawn do
  loop do
    ch.receive
    ch2.send(ary.shift)
  end
end

a = ary.map do |i|
  ch.send nil
  ch2.receive
end

puts a # [1,2,3]
puts ary # []

straight-shoota · November 4, 2024, 11:28am

Yes, Channel is a safe mechanism to communicate between fibers.

straight-shoota · November 4, 2024, 11:34am

Yes, definitely a crash is a better outcome than silently corrupted data. Even better would be a recoverable exception. Or just everything working safely =)

I think it should be clear that we can never accept the standard library to produce any kind of memory corruption (at least not without unprotected parallel access). It must have at least as much resiliency as to recognize a memory issue and produce an error.

ysbaddaden · November 4, 2024, 1:16pm

There’s a whole book about Ruby being thread unsafe, despite the GIL: Working With Ruby Threads.

Ractors, as outlined by @HertzDevil above, are a share nothing with copy/move semantics alternative to the Thread object + exception for “shareable objects” (similar to Sync in Rust?) which includes immutable data.

yxhuvud · November 4, 2024, 5:07pm

I find this assertion way, way stronger than what I think is a reasonable tradeoff from a performance perspective. I’d want Crystal to provide safe and easy to use ways of doing multi threaded programming, and that the documentation and type system would push towards safe variants where reasonable, and allow unsafe workarounds. But I don’t want all actions on anything that handles any sort of state to be wrapped in locks, which is what you essentially propose.

straight-shoota · November 4, 2024, 6:39pm

After writing that comment I realized that I missed naming the essential restriction: this only applies to single-threaded access. I’ve sinced updated the comment. Not sure if you noticed that before writing your reply.

So yeah, I agree that it’s not feasible to wrap everything in bubblewraps for complete thread safety.

But I think we should be able to make stdlib safe in single-threaded environments, i.e. without parallel access. That applies in multi-threaded environments to memory that’s only accessed from a single fiber or a set of fibers which are prevented from running in parallel.

sol.vin · November 5, 2024, 4:18pm

How possible is it to “detect” that a piece of memory is being touched in multiple fibers? Like given a spawn block and all the other code can we detect that a bit of memory is referenced in both? Maybe there is an annotation or something that could be made to make a variable or bit of memory as “only to be used in a single thread” so it could be easier to detect when a programmer made a mistake? The only language I have seen get really interested in protecting memory in any way is something like Rust, so I don’t have much of a reference to go off of how other languages would want to protect memory.

straight-shoota · November 5, 2024, 7:51pm

I’ve opened a specific issue about Array#map:

github.com/crystal-lang/crystal

`Array#map` is vulnerable to concurrent mutation

opened 02:12PM - 05 Nov 24 UTC

straight-shoota

kind:bug topic:stdlib:collection

`Array#map(&)` yields each element to transform it in the block. If the block pe…rforms mutating operations on the array, it can lead to invalid memory access or corrupted data. The same applies to `Array#map_with_index`, and I mean both when I refer to `#map`. This is discussed as an example in https://forum.crystal-lang.org/t/safety-guarantees-of-stdlib-data-structures/7364 A simple reproduction of an example with data corruption: ```cr ary = [1, 2, 3] ary.map do |e| ary.pop if e == 1 e end # => [1, 2, 0] ``` The cause is that `Array#map` expects `size` to be static during its runtime. While this is generally a reasonable expectation, it's impossible to exclude that the block mutates the array's size. `Array#map` is an optimized override. The base implementation of `Enumerable#map` does not have this bug because it does not assume any size. It grows the buffer on demand. The easiest solution is to drop the implementation of `Array#map` in order to fall back to `Enumerable#map`. We can still optimize a bit by using the current size as size hint because usually it's gonna be stable.

Fryguy · November 12, 2024, 3:49pm

I was surprised this wasn’t mentioned, but Ruby has kinda solved this a different way via the community rallying around concurrent-ruby, which includes thread-safe variants of primitive types as well as a laundry list of other Concurrency tools.

Topic		Replies	Views
Raise if containers accessed while being "changed"? Crystal Contrib	3	494	November 2, 2021
Charting the route to multi-threading support Crystal Contrib	4	402	October 23, 2024
Introducing the "Sync" shard News	2	197	May 17, 2025
Is concurrency broken? Community	14	1510	October 28, 2019
Blog: Parallelism in Crystal Official	6	959	September 13, 2019

Safety guarantees of stdlib data structures

Related topics