Eliminate environment modifications

Modifying a process’ environment is generally not thread-safe because it is memory shared between all threads of a process.
System functions like setenv or SetEnvironmentVariable are based on a global variable (environ on unix systems). When a thread updates the environment configuration, it may reallocate the memory. Thus, any other thread which had previously obtained the pointer before the update and is now reading from the old memory, may be left with corrupt data.

We could offer some mitigation by synchronizing access through Crystal’s ENV interface.
However, that would still not be 100% safe because it cannot account for direct access via the system APIs.

But maybe there’s a simpler solution: Disallow environment modifications, i.e. deprecate ENV.[]=.

This idea is based on this hypothesis:

Unless your process is a shell or adjacent, it shouldn’t need to mutate its environment.

We’d need to look a bit more closely at specific use cases, but I think this might be a valid option.

It should be feasible to keep the functionality of ENV.[]= available in a portable interface, but with an explicit warning that it’s potentially unsafe (e.g. ENV.unsafe_set).

Safe usage could only be guaranteed in a single-threaded process. It should be safe to set environment variables at the start of a multi-threaded process, before it spawns any other threads.

A common usage pattern is when you need to adjust the environment for a piece of code that you cannot configure directly but depends on environment variables.

This is primarily relevant for testing code that is supposed to react to environment settings. For example, we test several features in the standard library and compiler using the with_env helper.
This functionality seems reasonable and relevant.
But instead of modifying the process’ environment, we could mock it using a fiber-local overlay in the ENV interface.
In some cases, it might even be sufficient to just pass in configuration as explicit arguments.

I’m not aware of any specific other use cases. But I could expect instances where you need to modify the environment for calling a library methods which are not using Crystal’s ENV interface and thus need to read from the actual process environment.
That generally smells like bad design, but there might be some use cases.

I’m posting this thread to ask for feedback and examples.

4 Likes

Deprecating ENV#[]= sounds bold at first… but in practice ENV should only be accessed on application startup to setup/override application settings (e.g. TZ, DATABASE_URL, FOO_ENV), then never used again, with a few exceptions, for example accessing PATH to spawn a child process —but does that imply that PATH needs to be dynamically changed at runtime?

Some notes:

  • Neither environ nor the *env functions in POSIX are expected to be thread safe. Glibc has only been made reentrant and signal safe in 2.41 for example (released in 2024).
  • We could protect ENV with a rwlock, but libc functions such as execvp (or getaddrinfo on glibc) call getenv (to get PATH or HOSTALIAS). They’d bypass the lock… unless we wrap the libc calls known to access the environment :thinking:
  • External libs can also access or alter the environment, and would also bypass the lock… which means we should expose the lock… :unamused_face:

Now, removing ENV#[]= wouldn’t make the issue nonexistent: external libs can still mutate the environment under the hood, but that’s an issue with the external lib I guess. I believe libc should only read and thus be safe.

1 Like

On server applications it’s a very common pattern to keep configuration in a dot-env file and load it into the app environment using something like gdotdesign/cr-dotenv

FWIW github search with filters https://github.com/search?q=lang%3Acrystal+%2FENV%5C%5B.%2B%5C%5D+*%3D%5B%5E%3D%5D%2F+-path%3Aspec%2F+-is%3Afork+-repo%3Acrystal-lang%2Fcrystal&type=code&p=1

How would this work for shards that read in a .env file and load those in to ENV?

I suppose provisioning environment variables at process startup is a special use case that could be handled safely.
It could be placed in front of spawning threads.
We’ll need to figure out how exactly this should be implemented, but it seems very feasible to me.
For example, LuckyEnv.load could use an unsafe replacement for ENV.[]= and raise if more than one thread is running.

Another solution and common practice is to load the .env file into the environment before executing the actual process (cf. The Twelve-Factor App). Then you don’t have to deal with this at all. This even works for development mode.

2 Likes

This just makes me sad. so instead of using a config file, the server looks at the environment and that is filled on startup from a config file.

I can’t see this as anything else but the Cargo Cult equivalent of software development.

(on topic, i think making the environment unmutable through the interface is absolutely the correct thing to do. If you want to play dangerous, you can always us an external lib.)

Wow. “Cargo cult equivalent of software development”. Sure dude (looks to see if this place has a mute button)

4 Likes

My main use case is Dotenv.load? in development, as @ralsina mentioned. It’s a pretty decent 12-factor shim. I don’t deploy .env files for the same reason @mavu points out, but it’s handy in development to simulate the way my production environment will load its config.

This is my preferred approach. For my use case, ENV#[]= is only ever called at app boot time so it doesn’t need thread safety, but thread safety also doesn’t harm it.

I also generally don’t touch ENV at all after bootstrapping, though. All my ENV-based config is loaded into either constants or an application config object. This way I avoid calling getenv in a hot loop.

I think this is a reasonable limitation to accept. If you’re using system APIs, you generally have to provide your own guarantees for thread safety and other things like memory access anyway. Using system APIs directly is an advanced concept in Crystal so, to me at least, it makes sense for that to be explicitly unprotected.

Indeed, 100% on startup, load the env into the internal config object and that’s it.

The main problem with ENV.[]= is that it doesn’t look like a here-be-dragons system API. It appears as a convenient and inconspicous high-level Crystal interface, which we usually try to bubblewrap so you don’t accidentally shoot yourself in the foot.
But it is not possible to make this safe.

So the least we should do is to make the dangerous nature clearly visible, for example by renaming it more obviously.

Is there a way to add to the ENV at compile-time? Maybe a mix of a {% read_file %} and some sort of {% set_env(“LUCKY_ENV“, value) %}. In that case there wouldn’t be a reason to use ENV[]= at runtime.

I don’t think that would work. At least not in a way that would be reasonably simple to implement.

Also, baking the .env file from the development environment into the executable sounds like a secrets disclosure waiting to happen :see_no_evil_monkey:

1 Like

Oh yeah, it’s the opposite of what .env is for (having secrets in a local gitignored file for development)

I went and audited Athena usages of ENV#[]=, there aren’t too many places it’s used. The Dotenv component is the most obvious. But as already mentioned that’s intended to be used before starting the server or anything so is probably fine.

Other than that, there are a few usages in the Console component:

Given both of these things happen in the context of a CLI command execution before the actual command runs, I’m thinking that they’re also fine as is?

So what I wonder is, how often do you need to have something that is ACTUALLY raw environment variables? Would something like a proxy with program specific overrides not solve most of the use cases? Perhaps with a hatch to access the system level in the once-in-a-blue moon case where you actually need it?

Such a proxy could have a threadsafe API, and be mutable.

1 Like

My understanding is that the API is still not thread safe in general - only that they changed so that `getenv` will leak memory forever instead of crashing. `setenv` is still borked, if I remember correctly.

Unfortunately broken by design. Someone needs to build a new interface to replace the old one to really fix it.

Instead of the dotenv-like shards, I can’t recommend enough to try the excellent totem shard :heart_eyes:

It’s dedicated to “application settings” in general and can read and consolidate from many sources (ENV, .env, JSON, etcd, custom backends, …), directly populate classes/structs, export settings, …

@yxhuvud They already never deallocated the old environ pointer, but 2.41 added some atomic ops and fixed the behavior. See set_var/remove_var are unsound in combination with C code that reads the environment · Issue #27970 · rust-lang/rust · GitHub and links.

How about decoupling ENV from the lowlevel system interface?

I was thinking that if you want to modify the env of a sub-process, but want to avoid ENV.[]= you’d end up maintaining your own env set and pass it to Process.run.

But if ENV was more of a simple variable which got preloaded with the process env, which was used per default by run et al, one could keep on pretending that it just works (save for libraries that go to the process environment).

1 Like

That’s an alternative we thought of, but we’re not sure whether it’s really valuable.

We’d always dup environ and thus allocate memory for every key/value during startup, and keep them forever in GC HEAP. Unlike libc that never deallocates old environ buffers (e.g. glibc) the memory could be garbage collected, but ENV shall only be modified (or even accessed) during startup mostly (for Process.run you should pass an explicit env) so that advantage is kinda moot.

Without talking of external libs, it will interfere with libc calls, like execvp that needs to getenv("PATH"). Granted, we have our implementation to search the path, while we don’t use other cases or they aren’t portable (even on the same OS but different libc).

So, we don’t see much incentive to bother with that. There’s apparently not much appeal to mutate ENV at runtime except for the dotenv-like scenarios, for which there are better alternatives.