Stdlib - A mixed discussion about some modules

Morning,

I thought I’d create a more generalised thread as a way to start the discussion about some parts of the stdlib that could then be moved into their own thread/issue or be resolved quickly if there was something of a consensus.

Process

The Process module provides a high-level API, but I wonder if it’s worth either clarifying the docs or standardising some of the methods. I was a bit hung up on trying to tee the contents of stdout & stderr (show in foreground and have access to it after completion) and get the exit code at the end.

While I realise it’s a bit of an edge case, I ended up with:

  • Process.new, output and error as IO::Memory objects.
  • Tail the IO::Memory objects with a while Process.exists? loop.
  • Use process.wait.exit_status at the end (after I know the Process has already completed) to get the exit code.
    (Some of this may be doable in a fiber but I didn’t know the blocking semantics for this kind of process and it wasn’t required in this simple example)

This was all totally workable, but since it’s a high-level API, I wondered if it would be worth tweaking so that:

  • Provide a way of getting the Process::Status via. a property instead of relying on wait, as wait seemed a bit strange since I already knew the process had exited.
  • Standardise all of the main entrypoint methods (.fork, .run, .new) to return a Process - Currently, only .run returns a Process::Status
  • Clarify docs/arguments of .fork as it seems to be impossible to get the contents of output and error as the process has forked?

IO

This one was kinda a mix bag - query, suggestion, request for input.

Does the current IO implementation - or child implementations, like File - provide a way to ‘stream’ (lazy-load) at a high level? Methods like .each_line would be a great candidate, or perhaps another abstraction that provides a Stream object which acts like a generator instead of loading in the entire contents into memory.

I’ve looked at the IO::Buffered docs and it looks like it should be doable with the current tools, but I’m not experienced in this kind of process - what’s a normal amount of bytes to read, do we rely on file encodings to determine when to break, etc.?

Just kinda wondering the current state and if anyone has any input in how we can get this implemented?

Path

Currently I find Path a useful module, coming from using Python’s pathlib, I feel like it gives a load of great, high-level abstractions.

I had a gripe with the syntax (Path[root].join(foo, bar)) but just realised that if I’d RTFM, I can use Path.new(root, foo, bar)

However, it seems a little odd that the likes of Dir do not accept a Path object. This is only a minor objection because the .to_s method fixes this issue, but it seems like a bit of a disconnect when the Path module is so useful, and it usually means going String -> Path -> String -> Dir - Is there any cause for a bit tighter integration? Or is it fine the way it is?


Now, I don’t want this to come across as negative in any way. So far, Crystal has been a pleasure to write.

But I am trying to start a meaningful discussion on parts of the stdlib that I think could be improved upon (rough edges), and I’m willing to contribute code and docs to so.

Looking forward to any feedback.

5 Likes

Hi!

Thank you for the thoughtful post. We never take these comments as negative, we know there is still a lot to improve.

I’ll go section by section:

Process

It would be really nice if you could show your current code. From your description of the problem:

I was a bit hung up on trying to tee the contents of stdout & stderr (show in foreground and have access to it after completion) and get the exit code at the end

So it seems you want to output to STDOUT but also have the contents later. We can use IO::MultiWriter for that: write to STDOUT and to a separate IO::Memory which we can consume later:

output = IO::Memory.new
multi_output = IO::MultiWriter.new(STDOUT, output)

Process.run("ls", output: multi_output)

puts
puts "Above was output from the process, but here it goes again!"
puts output.to_s

Play link

(Process.run returns the status so you have that too)

But maybe you want to send it to STDOUT and to something that you can receive input from. That’s what IO.pipe is for.

read, write = IO.pipe
multi_output = IO::MultiWriter.new(STDOUT, write)

spawn do
  while line = read.gets
    puts "Got line!: #{line}"
  end
end

Process.run("ls", ["-la"], output: multi_output)

Play link

In the above snippet we probably don’t need multi output because we can already output from the read end of the pipe.

But I guess it all depends on what exactly you want to accomplish. We can try to find a way to do it and see if the existing API can solve it in a nice way.

What I know is that we are probably missing some examples in Process, or in general there’s a missing tutorial for it.

IO

Does the current IO implementation - or child implementations, like File - provide a way to ‘stream’ (lazy-load) at a high level?

What do you mean by streaming?

Methods like .each_line would be a great candidate, or perhaps another abstraction that provides a Stream object which acts like a generator instead of loading in the entire contents into memory.

What is a generator? The concept of generator exists in multiple languages with different meanings.

Maybe showing what code you have in mind would help me understand this.

Path

Path is a very recent addition to the standard library. That’s why it’s not available in many methods yet.

1 Like

Process definitely needs better documentation. I know I’ve been looking at the code, to get to understand what each method does and how to use it. A few examples would already work great :+1:

Path was only added in 0.28.0 and still needs to be integrated with other APIs. PR’s are welcome =)

After a bit more poking about, I looked into IO a bit more and find the likes of IO.each_line and the Iterator construct which I had thus-far been missing. :+1: