Improve Process API

j8r · March 21, 2019, 4:33pm

I find the Process API with Process.new and Process.run confusing, often mixing both forgetting which one is blocking or non-blocking.
Another pain point compared to said, Python, is to easily create Strings from the STDOUT and STDERR from a subprocess.
I created https://github.com/j8r/exec.cr to experiment how to improve this API.

Exec.run is always non blocking (we can have a variant taking no blocks for the stdlib)

Exec.new("/bin/ls", ["/tmp", "-l"]) {}

To change it to blocking

Exec.new "/bin/true", &.wait

And finally to catch the STDOUT and STDERR to strings, and puts its success

output, error = Exec.run "echo hello", { |process| puts process.wait.success? } #=> true
puts output.to_s #=> hello

Discard the basic shell argument interpreter in the library for this proposal.

This is essentially a draft, the main points I want to underline is:

can we have a output, error = Process.new API?
can we remove the blocking Process.run and only have the non-blocking Process.new to remove any confusion/duplication?

stronny · March 21, 2019, 6:38pm

I would say that process management in Crystal is not very usable in general at the moment. The most common use-case would probably be this: block the current fiber until the child exits, then give me its exit status, stdout and stderr. A slight tweak would be to additionally feed its stdin.

The other use-case, and a more general problem is an async event-driven flow: start this child, and while it runs feed its stdin, and read its out and err in chunks, and process these chunks. The problem here is a mental model to be able to not go insane. Fibers and channels do not help here at all, what I think would work is libev’s event-loop, something like this:

loop do
  event = sleep_until_one_of_these_events_happen \
    child.exited,
    child.stdin.ready_to_write,
    child.stdout.ready_to_read,
    child.stderr.ready_to_read,
    timeout.fired,
    its_time_to_stop.raised  # something like a sigint to this fiber
  process event # with overloading depending on event's class
end

It’s useful to aggregate different events in a single fiber to avoid a thin paste of one-line fibers all over the code, which is of course just my opinion.

paulcsmith · March 21, 2019, 6:52pm

I also have trouble with the Process syntax because the defaults require a lot of arguments and it is not always clear whether is is blocking or not.

In all my projects I write some kind of helper method that wraps Process because of that.

Clarify blocking v. non-blocking

I’m not sure why Process by default would be non-blocking, but either way I think we could clarify this by calling the methods what they are so it is always Crystal clear what is going on

Process.sync(...args)
Process.async(...args)

I think that makes it incredibly easy to tell what it is doing. You don’t need to pull up the docs to know what these 2 are doing.

Not totally sold on the names, but the principles stand I think. The API could be Process.async and Process.wait or something. But in general if there are two ways to do something giving them clear names is a good way to go IMO.

Capturing output

I love the idea of making working without output easier. I think rather than returning output and error, it could return a ProcessResult object that had an #output and #error. This way you don’t have to remember which position output is. Instead you have a nice object with documentation and that can be enhanced with additional methods in the future if helpful:

result = Process.sync "echo hello"
puts result.output.to_s #=> hello

It could also have #status that returns a Process::Status for example. I think returning a ProcessResult would be super helpful and flexible

Shell true

Another part that is confusing to me is when to use shell: true. It is not clear from the documentation when to use it or not. I don’t think this is a problem with the API, but maybe some more .examples/clarification around this would be helpful.

I’d be happy to contribute but I honestly don’t know the difference between shell: true/false

Error Handling

Right now by default a failed process does not raise an error. This has generated a number of bugs for me when I forget to handle errors. It takes awhile to figure out where the problem is when there is no error.

My suggestion here is to make them raise an error by default and have ? versions that don’t raise. Or if we don’t want a breaking change (or just don’t like the ? version), introduce a ! version so that it is at least super easy raise when there is an issue, e.g. Process.sync

j8r · March 22, 2019, 12:42pm

shell: true invokes a /bin/sh -c 'mycommand' subprocesss that executes the command (so it may also create a sub-subprocess). shell: false directly execute the command, you need to have the arguments “pre-parsed” as an Enumerable.

stronny · March 22, 2019, 1:06pm

And unless you use exec in the shell command you will get neither the pid, nor the exit status of your command run by shell.

j8r · April 22, 2019, 4:03pm

At least I am not alone to find room for improvement in Process .
I’ll open an RFC.

absolutejam_gh · July 26, 2019, 7:49pm

Didn’t realise this thread existed because it was one of my points in Stdlib - A mixed discussion about some modules. If you have an RFC, I’d be interested in seeing it!

Topic		Replies	Views
Stdlib - A mixed discussion about some modules Crystal Contrib	3	563	July 26, 2019
Process::ORIGINAL_STDOUT and ORIGINAL_STDERR change Help & Support	4	576	May 9, 2023
Problem to get output of system command called Help & Support	8	414	December 24, 2021
How to run a command but not wait for it to return? Help & Support	1	145	May 9, 2024
[solved] How to run a command and get its result, and how to intercept a shell context Help & Support	6	760	July 28, 2023

Improve Process API

Clarify blocking v. non-blocking

Capturing output

Shell true

Error Handling

Related topics