Improve Process API

I find the Process API with Process.new and Process.run confusing, often mixing both forgetting which one is blocking or non-blocking.
Another pain point compared to said, Python, is to easily create Strings from the STDOUT and STDERR from a subprocess.
I created https://github.com/j8r/exec.cr to experiment how to improve this API.

Exec.run is always non blocking (we can have a variant taking no blocks for the stdlib)

Exec.new("/bin/ls", ["/tmp", "-l"]) {}

To change it to blocking

Exec.new "/bin/true", &.wait

And finally to catch the STDOUT and STDERR to strings, and puts its success

output, error = Exec.run "echo hello", { |process| puts process.wait.success? } #=> true
puts output.to_s #=> hello

Discard the basic shell argument interpreter in the library for this proposal.

This is essentially a draft, the main points I want to underline is:

  • can we have a output, error = Process.new API?
  • can we remove the blocking Process.run and only have the non-blocking Process.new to remove any confusion/duplication?
1 Like

I would say that process management in Crystal is not very usable in general at the moment. The most common use-case would probably be this: block the current fiber until the child exits, then give me its exit status, stdout and stderr. A slight tweak would be to additionally feed its stdin.

The other use-case, and a more general problem is an async event-driven flow: start this child, and while it runs feed its stdin, and read its out and err in chunks, and process these chunks. The problem here is a mental model to be able to not go insane. Fibers and channels do not help here at all, what I think would work is libev’s event-loop, something like this:

loop do
  event = sleep_until_one_of_these_events_happen \
    child.exited,
    child.stdin.ready_to_write,
    child.stdout.ready_to_read,
    child.stderr.ready_to_read,
    timeout.fired,
    its_time_to_stop.raised  # something like a sigint to this fiber
  process event # with overloading depending on event's class
end

It’s useful to aggregate different events in a single fiber to avoid a thin paste of one-line fibers all over the code, which is of course just my opinion.

1 Like

I also have trouble with the Process syntax because the defaults require a lot of arguments and it is not always clear whether is is blocking or not.

In all my projects I write some kind of helper method that wraps Process because of that.

Clarify blocking v. non-blocking

I’m not sure why Process by default would be non-blocking, but either way I think we could clarify this by calling the methods what they are so it is always Crystal clear what is going on

Process.sync(...args)
Process.async(...args)

I think that makes it incredibly easy to tell what it is doing. You don’t need to pull up the docs to know what these 2 are doing.

Not totally sold on the names, but the principles stand I think. The API could be Process.async and Process.wait or something. But in general if there are two ways to do something giving them clear names is a good way to go IMO.

Capturing output

I love the idea of making working without output easier. I think rather than returning output and error, it could return a ProcessResult object that had an #output and #error. This way you don’t have to remember which position output is. Instead you have a nice object with documentation and that can be enhanced with additional methods in the future if helpful:

result = Process.sync "echo hello"
puts result.output.to_s #=> hello

It could also have #status that returns a Process::Status for example. I think returning a ProcessResult would be super helpful and flexible

Shell true

Another part that is confusing to me is when to use shell: true. It is not clear from the documentation when to use it or not. I don’t think this is a problem with the API, but maybe some more .examples/clarification around this would be helpful.

I’d be happy to contribute but I honestly don’t know the difference between shell: true/false

Error Handling

Right now by default a failed process does not raise an error. This has generated a number of bugs for me when I forget to handle errors. It takes awhile to figure out where the problem is when there is no error.

My suggestion here is to make them raise an error by default and have ? versions that don’t raise. Or if we don’t want a breaking change (or just don’t like the ? version), introduce a ! version so that it is at least super easy raise when there is an issue, e.g. Process.sync

4 Likes

shell: true invokes a /bin/sh -c 'mycommand' subprocesss that executes the command (so it may also create a sub-subprocess). shell: false directly execute the command, you need to have the arguments “pre-parsed” as an Enumerable.

And unless you use exec in the shell command you will get neither the pid, nor the exit status of your command run by shell.

At least I am not alone to find room for improvement in Process :slightly_smiling_face:.
I’ll open an RFC.

Didn’t realise this thread existed because it was one of my points in Stdlib - A mixed discussion about some modules. If you have an RFC, I’d be interested in seeing it!