The Crystal Programming Language Forum

Fiber is danger or not on Crystal?

This thread seems appropriate to point out a recent study over real-world concurrency bugs in Go.

The paper is available at https://songlh.github.io/paper/go-study.pdf, and there is a nice wrap-up blog post about it: https://blog.acolyer.org/2019/05/17/understanding-real-world-concurrency-bugs-in-go/.

Distinct projects are examined to find whether concurrency bugs are related to channels or sharing memory.

Some interesting facts (read on the references for nicer overview) :

  • 38.6% of examined bugs involve message passing.
  • Message passing seems to introduce more blocking issues (e.g. starving coroutines) than memory sharing does.
  • Go’s runtime race detector
    • detected 2 over 21 reproduced “blocking” bugs.
    • detected half of reproduced “non-blocking” bugs.

Among the causes, authors point out :

  • Coroutine creation with closures
  • Buffered vs unbuffered channel implications
  • Usage of select

It’s also worth mentioning a few things:

  • Crystal is still pretty young and its usage, compared to Go and Ruby, is not that high. That’s why there doesn’t seem to be a lot of code out there taking the most out of spawn and fibers.
  • Crystal runs on a single thread so even though data races are possible (like ko1’s example) they are less common: sharing data is not a problem right now (for example two fibers adding elements to a same array).
  • Crystal might be able to provide better abstractions compared to Go because of its type system and features (generics, modules, overloads, etc.). And I expect the standard library to provide nice abstraction for dealing with the most common concurrency patterns. For example an Actor library would be one such approach. If users are encourages to use these patterns then the amount of bugs might decrease.
1 Like

I use fibers inside the Task Monads https://github.com/alex-lairan/monads/blob/master/src/monads/task.cr

The advantage is that you don’t have to think about concurency, just say you have a task it may fail, and do something else. When you need the data from the task, 2 possibilities :
1 - The task is not finished -> Wait for the end of the task then give the data
2 - The task is finished -> Give the data

When I communicate this the database, I alway create a task :)

Could something like STM in Haskell and Clojure be implemented in Crystal? I don’t know if it is even possible, or how hard it would be. But that might turn out to be a good solution.

Interesting! Mutex is shared with Fibers and Threads? (maybe same situation with Ruby’s plan)

Yes. “unless there is an IO, Channel operations or Fiber.yield” is key. My first example shows unexpected IO (program is small, and it is easy to find out). Of course, it is safer than Threads.

Makes sense. I’m very exciting to see it.

Thank you for sharing great summary. I only know the title of this paper, but not read yet.

BTW (completely off-topic)

Message passing seems to introduce more blocking issues (e.g. starving coroutines) than memory sharing does.

Blocking issue is easy to debug because we can see the backtraces compare with data-race issue.

One comment (off-topic too. Sorry):

Haskell and Clojure are basically immutable languages and introducing STM is good solution because STM data are protected by transactions. In other words, all mutable data are forced to protect by the language using STM. I like this approach very much.

Crystal (and other many languages includes Ruby’s thread) allows mutating a sharing data. It introduces data-race problem.

With Guild abstraction, I want to achieve casual mutable programming and dependable concurrent program in Ruby.

@ko1 Wouldn’t most pure ruby (not C extensions) run correctly multithreaded without the GIL if Mutex was Fiber safe? C extensions could use a recursion safe global lock with additional functions to change lock granularity such as:

# Method is thread safe.  No locking.
rb_method_thread_safe(...)
# Use a specific Mutex instead of the global default.
rb_method_mutex(...)

# Methods defined after any of these calls inherit the
# it's setting for the particular class.
rb_class_use_global_mutex(klass) 
rb_class_use_per_class_mutex(klass, mutex_name) 
rb_class_use_per_instance_mutex(klass, mutex_name)
# Custom mutex possibly shared for the entire library.
rb_class_use_mutex(klass, mutex)
rb_class_thread_safe(klass)

Benefits:

  • I think this would work without modifying ruby existing programs.
  • Guilds and redesigning existing programs aren’t necessary.
  • C extensions have backwards compatibility.
  • Minimal changes are necessary to C extensions for better performance.

Drawbacks:

  • There’s probably a few more places that need thread safety.

In Ruby, nothing is thread save because of the GIL. Having thread save mutexes doesn’t help anything. At least common data types like Array need to be thread save as well.

All of the ruby programs I’ve worked with used Mutexes around shared data structures to avoid race conditions. Modifying an Array inside a Mutex is safe correct?

Yes, but then you need to guard every operation on a shared data structure with a mutex. Even something simple as arrray[0]. Having to implement that in user code would be silly and error prone. That’s why you use thread-safe data structures for this.

1 Like

With the GIL

  • Race conditions possibly logically corrupt data which may or may not raise an exception.

Without the GIL:

  • Race conditions tend to corrupt more and crash the program faster.

Either way ruby programs need either concurrent data structures such as Queue or Mutexes. Existing threaded programs already use them.

I’m curious about the number of existing threaded ruby programs and libraries that would safely run without the GIL and without modification because of their use of Queue, Mutex and testing on jruby.

Many popular ruby programs and libraries have already done the work of wrapping shared data in Mutexes or using concurrent safe data structures for jruby compatibility. If they aren’t safe how do they work on jruby or other fully threaded ruby vm’s without a GIL?

Mutex and Fiber is completely different thing. On current Ruby, Fiber is (semi-)coroutine and user can (must) switch fibers by programmer. In other words, programmer must schedule fibers completely. Programmer must not switch fibers during atomic operations, like above example (or programmers need to introduce Mutex-like synchronization mechanism by themselves).

Thread-safe do you want to say?

(1) On MRI (ruby command), it has GIL and operations written in C functions are thread-safe in many cases (*1). In other words, programs written in Ruby is not thread-safe.

a = [1, 2] # assumption: a[1] should be doubled value of a[0]
Thread.new{
  p(a[0] = 10) # debug output
  a[1] = 20
}
Thread.new{
  p a # observe an array.
      # Programmer expects "a[1] should be doubled value of a[0]", but ...
}
a[0] = 100
a[1] = 200

This program is not thread-safe and can violate the assumption “a[1] should be doubled value of a[0]”.

(*1) If a method written in C calls Ruby methods, it can switch threads, so that it is not thread-safe.

(2) JRuby and other implementation don’t have GIL and basic operations can not be thread-safe (I don’t know details).

puma web-application server supports thread-execution and I heard recent Ruby on Rails framework support multi-thread execution by many efforts. I think many code are already thread-safe.

I respect these efforts, but I doubt there are more thread-safety bugs :stuck_out_tongue: because it is too tough to expose all this kind of bugs.

@ko1

In 2010 I used rails with puma and rainbows on a large scale web site. Google reported around 400-600m page views per month but my memory is a little foggy since I was mostly concerned with request latency. Several times that number of ajax queries were handled by unicorn/puma or later rainbows in thread pool mode. The distinction between using unicorn or rainbows came down to CPU use. The load balancer sent CPU bound request such as rails page rendering to unicorn clusters. Puma or later rainbows handled workloads with little rendering such as ajax queries that spit out json from the db’s.

Every page was thread safe and callable from either cluster and we often used the load balancers to switch requests between them based on various metrics.

We also used a custom queueing system (pre sidekiq) that ran each job in it’s own thread. Lots of ruby libraries were used. All of them were thread safe.

Cross testing was done on jruby and rubinius with real threads but for our workload most requests were faster on MRI.

Lots of other people working on jruby, rubinius and even MRI had to add thread safety to their code a long time ago. Threads on MRI still break applications without locks or queues even with the GIL.

To me this issue was solved ~10 years ago with ruby libraries. MRI only needs to remove the GIL to keep pace with other ruby interpreters.

@ko1 Why not copy what jruby did with the runtime guarantees and allow existing thread safe ruby programs to run without the GIL?

More information here.

@ko1 the blog post is finally out Blog: Parallelism in Crystal

1 Like
person = ["Alice", 10] # name and age

# I/O simulation method
def search_age_in_internet(name)
  pp "Searching for.. #{name}"
  case name
  when "Bob"
    20
  when "Carol"
    30
  else
    0
  end
end


spawn do
  person[0] = "Bob"
  person[1] = search_age_in_internet("Bob") #=> 20
end	

spawn do
  person[0] = "Carol"
  person[1] = search_age_in_internet("Carol") #=> 30
end

Fiber.yield
p person

This outputs

[“Carol”, 30]

I removed the Fiber.yield in search_age_in_internet.

Is this a solution?