Can CRYSTAL_WORKERS=X be set in source code?

That may apply to the kind of software you write, but it hardly works as a general rule.

I certainly wouldn’t want some random simple CLI tool spin up ridiculously many threads by default just because it happens to run on a machine with 50 cores.

We should give developers more flexibility for defining the multithreading characteristics of a program. As a simple first step I would consider a compile-time variable to specify the default value (can still be overriden at runtime, but it would provide a default that adapts to the kind of program). A special value representing the number of available cores might also be a good idea to improve that story (specifically for high-computing use cases mentioned by @jzakiya).

3 Likes

Strong disagree. Having to tune to avoid contention is no better than having to tune to use as many resources as you possibly can. A number like 4 that’s some reasonable way in between 1 core and all the cores is a decent compromise between our two use cases.

Even stronger disagree here. I already mentioned above how the JVM controlling how many resources it uses makes it unstable in containerized environments. The person running it, not the person who wrote it, should control how many resources it gets.

Is there something stopping you from putting export CRYSTAL_WORKERS=8 in your shell rc file? This would give you exactly what you want.

2 Likes

For CLI tooling, I would be surprised if it used anything but one core by default TBH. Multicore or not should be a easy override on the command line (like --target=multicore:4). None of the regular CLI tools I use spins up additional threads.

I think there is another case to be made for allowing this to be set a runtime by the program. In the case where you distribute a Crystal program, it would be nice to be able to specify the number of workers for the user or allow the user to set the count via some sort of app config.

2 Likes

Let me say it one more time, I want anyone who uses my program to be able to programmatically control the use of whatever threads exist on the system it’s run on.

Please stop talking about what I can do on my system, external to the source code.

I don’t want anybody to have to do anything other than run the program and have it optimally use all the threads available on whatever OS|hardware its run on in the easiest way possible.

This is the first time I’ve seen you say this, and even using the site search I can’t find any instance where you’ve expressed this as the reason behind your complaint, but I’ve seen you talk a half-dozen times about how you don’t like setting it every time you personally run your programs.

I was giving you a reasonable solution to the specific problem you’ve been expressing. You ignored my replies rather than clarifying why you wanted it both in this thread and in another one last year where I gave you the same advice.

If you want to do things in code, you could provide a bootstrap script that will set up the environment variable for the user before running your program. This is a very common. A few examples:

1 Like

In an attempt to take this discussion in a useful direction: should the thread pool be created explicitly, with the caller having to add the fibers to it (maybe do some block magic to make it prettier)? That would allow arbitrary code to be run before initialization, which would solve @jzakiya’s problem. The overhead compared to the actual logic in the fibers should be negligible for non-toy programs.

I think a “default” factory method that reads the environment variable with a fallback to 4 would be useful. If more complex configuration is desired in the future, this would provide a cleaner way instead of cramming complex data types into environment variables. Also, if someone wants to have a few fibers share 2 threads, while another set share another 2 threads, they could do that.

I tried with @pseudonym answer, it works! @jzakiya maybe you can try this, but it maybe still not be able to programmatically control the use of number of cpu as parameter of CLI tool.

Each program has it’s own design considerations.
One needs workers = Cpu.count.
Another needs workers = 2 * disks.
Some prefer low worker counts because they don’t do many parallel operations.

Perhaps there should be an optional per program override for the default number of workers

# Only used if the environment variable is empty
CRYSTAL_WORKERS_DEFAULT = ->() { ... }
# or
Thread.workers_default = ->() { 128 }
# or
Scheduler.workers_default = ..
# or
...

(Apparently I previously posted this in the wrong thread)

I think what we should do is have an environment variable that is read at compile time that controls the number of threads, with a possibility to rely on the number of cpus. Then OP could compile their program like that and distribute it, and make sure it uses all cpus. Problem solved.

Yeah, I think that’s what I suggested in Can CRYSTAL_WORKERS=X be set in source code? - #21 by straight-shoota

For some use cases you may need more flexibility, though. For that you could override Crystal::Scheduler.worker_count. That should allow you to read the value from a configuration file, for example. Maybe we should document this method as part of the stdlib API.
Or for maximum flexibility of thread creation, you can override Crystal::Scheduler.init_workers.

1 Like

How to set 2x disks? Or cpu_count.clamp(1, 4) for a service that has at most 4 active Fibers?

Yes please. This seems like the simplest and most comprehensive solution.

My opinion here is that I agree that there should be a reasonably small number of threads created by default, and that number should be possible to modify using a global variable.

But one size does not fit all, and I do agree with jzakiya that programmatic control over threads are good to have. A single global value does not give enough control. I think it should be possible to set up separate thread pools with separate scheduling that run specific tasks. Both for once-off computations and for long running tasks.

I want something like

NestedScheduler::ThreadPool.nursery(thread_count: 16) do |pool|
  16.times do 
    pool.spawn { perform_slow_computation }
  end
end

which would then spawn 16 threads and start a fiber for each thread. Then the created threads would live until all fibers have completed.

The example works using GitHub - yxhuvud/nested_scheduler: Shard for creating separate groups of fibers in a hierarchical way and to collect results and errors in a structured way. which implements it (by overriding a lot of private stuff. Expect it to break for a while on every new Crystal release), which is a first stab at trying to get #6468 somewhere. There is much lacking (and probably many bugs), but the big showstopper stopping me from starting the discussion of upstreaming it into crystal proper is that it doesn’t yet support setting up a nursery where the fibers use the same thread pool as the parent. Which I feel is a necessary feature to have as setting up threads is too costly for many use cases.

3 Likes