How to add command line options to specify the number of threads?

I am creating a command line tool.
In this tool, I want to add a command line option to set the number of threads.

At the moment, I have something like this in mind.

First, add add_worker and remove_worker to Crystal::Scheduler.

class Crystal::Scheduler
  def self.add_worker
    pending = Atomic(Int32).new(1)
    th = Thread.new do
        scheduler = Thread.current.scheduler
        pending.sub(1)
        scheduler.run_loop
      end
    @@workers.not_nil! << th
    while pending.get > 0
      Fiber.yield
    end
  end

  def self.remove_worker
    return if @@workers.not_nil!.size <= 1
    @@workers.not_nil!.pop
  end
end

Next, write a option parser.

on("-t", "--threads INT", "Number of threads [4]") { |v| set_threads(v.to_i) }
private def set_threads(n)
  if n > 4
    (n - 4).times { Crystal::Scheduler.add_worker }
  elsif n < 4 && n > 0
    (4 - n).times { Crystal::Scheduler.remove_worker }
  else
    Utils.print_error!("Invalid number of threads: #{n}")
  end
end

This somehow works (or appears to work) But not reliable.

How would you implement this feature?

@@workers is generated here

It may be easy to avoid assigning jobs to some workers, but that will not increase the number of workers.

I know it is dangerous to change the number of working members during execution, I want to set it only once when I start the command (in options, not in environment variables).

1 Like

Aside from the usual CRYSTAL_WORKERS environment variable, this is how I did it with Benben (“–jobs” controls both the worker threads and number of fibers spawned in Benben): kludge. I’ve confirmed it to work, though I’m not particularly happy with how hacky it feels.

Note you don’t have everything at your disposal that early, so you may need to use LibC for a few things like I did.

2 Likes

You can perhaps use my library, GitHub - yxhuvud/nested_scheduler: Shard for creating separate groups of fibers in a hierarchical way and to collect results and errors in a structured way. that provides this functionality (achieved by MONKEYPATCHING ALL OVER THE INNARDS OF THE LANGUAGE IMPLEMENTATION). Be aware that I havn’t checked if it works for 1.11 yet, I won’t have time to do that before the weekend. It tends to break with new crystal versions due to it not respecting the private boundaries of the implementation, and that is fine with me, but consider yourself warned. It can and will break.

But please file issues if it breaks - earlier this year it took a couple of months before I noticed it was broken.

2 Likes

Definitely your all experts, unfortunately, as far as I know, many people like you, , are not interested in contributing to Crystal itself, but instead create new shards for some reasons.

WARNING: nested_scheduler replaces the built in scheduler with itself, which means that PROGRAMS THAT SPAWN FIBERS WILL NOT EXIT UNTIL ALL FIBERS HAVE STOPPED RUNNING. This is in general a very good thing, but it may be disruptive for programs not built assuming that.

I thought this really a good things, don’t know if there is any discussion about this.


Hi, i read through the README, but still have confusion, Does the following code will create a new thread for each pool.spawn { ... }? when use -Dpreview_mt?

NestedScheduler::ThreadPool.nursery do |pool|
  pool.spawn { sleep 5 }
  pool.spawn { sleep 2 }
end

Or should

NestedScheduler::ThreadPool.nursery(thread_count: 2) do |pool|
  pool.spawn { sleep 5 }
  pool.spawn { sleep 2 }
end

I don’t think you’re doing justice here. Both @yxhuvud and @MistressRemilia have made valuable contributions to Crystal in the past.

The thing is, the path to getting a change into a large project like Crystal has some intentional obstacles. They ensure changes are reasonable and benefitial for the language and community.
You need to cover all kinds of details, edge cases and compatibility considerations, making a convincing argument to gain Core Team approval. Especially for complex and integral components such as the scheduler, this is a very difficult task.
Launching an independent shard has much less resistance because you’re calling the shots. There’s no need to consider a broad range of existing use cases. This is great for getting an idea out and see how it works in practice. I think this is in fact very valuable for gathering information, which may lead to getting things merged upstream eventually.

5 Likes

I thought this really a good things, don’t know if there is any discussion about this.

About things not exiting before fibers are done, see Notes on structured concurrency, or: Go statement considered harmful — njs blog . Well worth a read and it basically redefined how I thought about concurrency back when it was posted.

Does the following code will create a new thread for each pool.spawn

No. The first example will create a dedicated thread to execute ALL fibers spawned in the pool in that single thread, and the second will create 2 threads dedicated to executing them. How many times you spawn does not impact the amount of threads created.

Regarding

Definitely your all experts, unfortunately, as far as I know, many people like you, , are not interested in contributing to Crystal itself, but instead create new shards for some reasons.

So the thing is, the shard is mainly an accident. It is not at all what I set out to build - if it was a serious try at starting at [RFC] Structured Concurrency · Issue #6468 · crystal-lang/crystal · GitHub , then it would have started in the other end, with building the thing that is missing in the current implementation - nurseries that don’t spawn new thread pools but instead work in an existing thread pool. It should by a big margin be the most common way to use it.

So why did I start in the wrong end? Well, because I had to. The problem I was aiming to solve was different, namely I wanted to have a way to have a pluggable event machine to drive crystal. And that is the main feature of nested_scheduler even if it is not documented. It is also probably not something Crystal upstream will want to have, even if the work to get there is pretty low once it is possible to set up custom pools. But in any case, it turned out that it is quite useful to be able to spin up independent thread pools on the fly, so I extracted those parts into a shard and put it out there.

So why have I not tried to upstream it. Well, as I mentioned, it is lacking the most central feature in that it doesn’t support creating nurseries without creating new threads. Additionally, there are other features that are missing in crystal scheduling/threading, and one of those are work stealing. The current way to do round-robin and then have the fiber locked to whatever thread it ended up to is quite limiting and creates a lot of issues. So it also needs to be solved, and having a variable amount of threads around will without doubt make things a lot more complex to reason about. Getting all that to work together will be a huge challenge and it would be good to have at least an idea of what implementation is wanted before starting to upstream stuff

3 Likes

About things not exiting before fibers are done, see Notes on structured concurrency, or: Go statement considered harmful — njs blog .

Cool, will read it carefully.

Limited by knowledge, don’t understand what you means.

No. The first example will create a dedicated thread to execute ALL fibers spawned in the pool in that single thread

Well, if this is true, what is the difference when add more than one pool.spawn?

e.g. Following example 1,2 what is the difference?

Is a new pool.spawn the same as Crystal’s default spawn keyword? besides waiting for it to be finish?

# Example 1
NestedScheduler::ThreadPool.nursery do |pool|
  pool.spawn { do_same_thing }
end
# Example 2
NestedScheduler::ThreadPool.nursery do |pool|
  pool.spawn { do_same_thing }
  pool.spawn { do_same_thing }
  # ......
end

Yes, except that any fibers that are spawned is run in the pool in question, rather than using the global pool.

1 Like

have made valuable contributions to Crystal in the past.

In the past is one of the key point, IMHO

I thought there are many much easier and useful parts, create use shards instead of contributing to Crystal itself.

Yes, except that any fibers that are spawned is run in the pool in question, rather than using the global pool.

Hi, thank you for answer, There’s one more question about this, what is the different spawn fiber in two pool instead of only one? (in case only one thread)