Can CRYSTAL_WORKERS=X be set in source code?

CRYSTAL_WORKERS=4 is default if you run a multi-threaded program.

In my source code, can I set at the end, with something like:

`CRYSTAL_WORKERS=8`

so I don’t have to do it explicitly every time I run the program like:

$ CRYSTAL_WORKERS=8 ./program

and then can do just:

$ ./program

Pretty sure it’s just an env var, so prob could just get away with export CRYSTAL_WORKERS=8 in your shell or like ENV["CRYSTAL_WORKERS"] = "8" in code.

That didn’t work, still defaults to 4 threads.
Tried different variants, same results.
Will play with later when I have time.

In that case it probably is already set at that point, so better off just exporting it in you shell. Same end result, but without tightly coupling your code to a specific thread count.

Yeh.

Internally in the main routine I do:

threads = System.cpu_count

So for whatever system it’s run on it will know how many threads it has to use.

I’m trying to have the source code setup the runtime to use the max threads for any system to eliminate a user having to manually set it every time the program is run.

Make some wrapper script for it? Or maybe it would be possible to override the Crystal.main, define the env var, then call default implementation of it. Tho I’d say the wrapper script would be the simplest…

I gotcha. Trying to see if I could do it the easiest way possible first.

Yes, at the point your main code executes, the Crystal runtime has already initialized the worker threads. You can’t change that number programmatically at runtime.

Just curious, is there an interesting reason for that?

In its current state, it’s because spinning up the fiber schedulers (one per thread) only happens at application bootstrapping time. IIRC last time I looked at the fiber scheduler code, nothing actually manages them after they’ve spun up — it simply creates the threads.

To make this adjustable at runtime would require a reconciler that will spin up new threads or terminate old ones after all fibers in those schedulers have completed. This is a complex task with a lot of edge cases:

  • If a fiber is stuck in a Channel#receive call, it will never complete, so the reconciler will never be able to collect it.
  • If the counter is increased again, should it count that thread whose last remaining fiber won’t complete?
  • When scheduling a fiber, how do you guarantee the fiber won’t be assigned to a fiber that is currently being drained?

These aren’t questions for you to answer, to be clear. They’re just questions that would need to be answered if a reconciler were to be added. Coordinating this is still a pretty hard problem. Comparatively speaking, processes are relatively cheap. If you need to change it, the most effective way to do it would be to restart the process with the appropriate value.

The best way to avoid writing CRYSTAL_WORKERS=8 every time you invoke your app is to put this in your shell config:

export CRYSTAL_WORKERS=8

Then you never have to worry about it again.

3 Likes

Can’t the default be a higher number, and if a system doesn’t have that many threads it will only use what it has. That’s better than having more threads that aren’t being used by default.

Yes, of course the default value can be improved. Whether it should always saturate all cores by default is questionable, though. The “best” behaviour for the worker count depends very much on the type of application and its deployment environment.

Multithreading is still not a stable feature in Crystal, so there just hasn’t been much thought put into what more complex default behaviour could make sense.

5 Likes

Exactly

I, as a user, want full programmable control over the total resource base of the hardware I’m using, including the use of its threads.

Crystal is currently the only compiled language I am aware of that doesn’t provide by default, full use of a system’s threads. In all the other languages I’ve written apps in (C|C++, D, Go, Nim, Rust) they use the number of threads the system has.

I realize Crystal’s multi-threading model is young, and really a concurrency model based on fibers and not a true parallel processing model based on threads. I hope sometime (soon) Crystal will have a true parallel programming model. It needs to to compete against these other languages in those domains where it’s necessary.

Maybe back in circa 2010/11 a default of 4 threads made sense when most of the systems in existence were Intel cpu based 32-bit 2C|4T systems. A decade later most of the (commercially available, home, mobile) systems are at least 64-bit 4C|8T. Within the next 5-10 years, the base systems will be 8|12|16 threads based AMD|Arm|et al systems.

Limiting any access to system resources shouldn’t be apart of the language. Any limitation is an arbitrary assessment somebody made of what a user needs to be able to control on their own system.

I would love for Crystal to implement|mimic the ease and performance Rust has with its Rayon crate for parallel programming. IMO, its better than OpenMP for C|C++, at least for what I’ve used it for. It’s so simple to use, and hides all the technicalities from the user. I would urge to at least philosophically understand its approach to performant|safe parallel programming.

Thus, I want to be able to programmably control all the threads my system has, in a safe and performant way. Should be real easy, right! :grin:

This may be a bit of a hack, but maybe something like this would work?

#!/usr/bin/env crystal

unless options = ENV["LAUNCHED_WITH_OPTIONS"]?
  puts "Launching with options..."
  system "LAUNCHED_WITH_OPTIONS=1 #{Process.executable_path}"
  exit
end

puts "Launched with options!"

Every one of those languages has far more developers working on it than Crystal, with orders of magnitude more money poured into their development. And while you’re right about Go, at least since 1.5 (GOMAXPROCS defaulted to 1 for 6 years after Go hit 1.0) and I can’t speak to Nim, last time I used C, C++, D, and Rust they all required you to create, manage, and collect your own threads. Do they have concurrency primitives now that don’t have a 1:1 mapping with POSIX threads?

Additionally, the number of cores on a machine isn’t sufficient context for a decision like this. It seems like the right move, especially for your specific use cases, but that carries an assumption that it’s the only thing using significant CPU on the whole machine, which won’t be true for folks deploying web services with it. For example, the JVM does exactly what you’re asking for here and causes problems in containerized deployment environments (it consumes CPU and RAM based on what the machine has rather than what’s available to the container). These kinds of things all need to be considered.

4 Likes

Crystal lets you redefine main for exactly this kind of thing:

fun main(argc : Int32, argv : UInt8**) : Int32
        LibC.setenv("CRYSTAL_WORKERS", "8", 1)
        Crystal.main(argc, argv)
end
8 Likes

Fortran and C mostly does not have that either. You have to jump very high up in the current standards to find threading support. It typically not a language construct but a runtime aspect of it. Crystal Lang is a much more managed runtime than C so to support threading is a bigger undertaking in crlang than in C.

1 Like

@jzakiya In the program you are trying this on, did running it with 8 workers instead of 4 led to a performance improvement?

Oh yes, it’s designed to run faster with more threads. If you want I can show you the results. Even better, I can show you the results with the Rust version that can do (now) everything I want in parallel (with an 8 thread system).

I think having the ability to use less than the max system threads can be useful, but shouldn’t be default. And it should all be under program control.