[RFC] Don't special-case `IO#puts(String)`

IO#puts is similar to IO#print (or the equivalent IO#<<) except that after outputting that object it also emits a newline.

However, if you pass a String to puts the behavior is slightly changed: if the string ends with a newline, an extra newline isn’t added.

Why is this like that? We copied it from Ruby! I don’t know why Ruby does it like that, but my guess is that if you have a program like this:

while line = gets
  puts line
end

because gets will return newlines by default, doing puts will end up with two newlines.

That’s why we also do the same thing in Crystal, except… except that in Crystal gets by default will not include the newline. So this exceptional behavior makes very little sense in Crystal.

Should we remove it?

It’s a breaking change, so maybe we can consider it for 2.0 and behind a flag for now.

What do you think?

Related discussion: Nice post! I'm happy that the API keeps looking better and better. I'd like... - DEV Community

5 Likes

Please remove it. I wrote a lot of scripts and on no situation I need that special case. And it’s anonying that you have to add two \n if you want to print with an extra blank line

4 Likes

I completely agree on the subject. But the question is if it’s worth changing it - even for 2.0 - because it might subtly break things. Thus changing has a cost that might be comparable to the cost of being surprised by the behaviour.

This was previously discussed in Always make puts append a newline at the end of a string · Issue #7679 · crystal-lang/crystal · GitHub ,but there was no decisive outcome. I closed that discussion about a year ago because there was no perspective on getting anywhere.
Another related discussion: puts, newline and documentation · Issue #10592 · crystal-lang/crystal · GitHub

That being said, I support an initiative for changing it. It’s an edge case and the changing costs might not be that much, actually.

1 Like

I have a strong feeling that if we make the change, it won’t break anything. Just a guess, though!

2 Likes

I assume each_line and #lines and #gets all don’t have trailing newlines?

I’m not sure I follow in which direction you propose to change behavior. To always add a newline? Or to make it an alias to print?

I actually like this behavior. puts removes the need for me to think about whether my source string produces a final newline or not. If I need an extra newline I do an extra argument less call to puts, which makes it really obvious to me what’s happening instead of tucking some \n somewhere at the end of something.

# print just this with a newline
puts something 

# print just this with a newline followed by an empty newline
puts something
puts

# print just this exactly whatever it contains
print something
2 Likes

The current implementation of IO#puts is this:

  def puts(string : String) : Nil
    self << string
    puts unless string.ends_with?('\n')
    nil
  end

  def puts(obj : _) : Nil
    self << obj
    puts
  end

The proposal is about removing the String overload as a special case. It would thus unconditionally append a newline regardless of the object’s contents.

I’m not sure I follow in which direction you propose to change behavior. To always add a newline? Or to make it an alias to print ?

To always add a newline, even if the string already ends with a newline.

Also the comment in the original post was that with this code:

class Foo
  def as_string
    "A\nB\n\C\n"
  end

  def to_s(io)
    io << as_string
  end
end

a = Foo.new
puts "One:"
puts a
puts "Two:"
puts a.to_s
puts "Done"

In Crystal the output is:

One:
A
B
C

Two:
A
B
C
Done

But in Ruby both outputs are the same.

That’s definitely unexpected, and I don’t think there’s any way for puts a and puts a.to_s to do different thing in Ruby.

That’s the main thing. We are doing something special when you pass a string that ends with a newline, but if an object’s string representation ends with a newline then there’s no special case. It’s inconsistent. And that’s why I think we should remove this special case.

I see. I agree the special case for strings only is bad, however I don’t think behaving like Ruby does is bad. Do you think an API for returning the last byte that was written to an IO is too much? A general implementation of the newline behavior on top of that would then be trivial.

1 Like

I guess having a IO#last_byte_written method would help to achieve this. We would probably need to store that on every write, though. For some cases we can get it from memory, but for a socket we’d have to keep it in memory. It will add a bit of overhead, not sure how much. But I don’t know if all of this complexity is worth it just to avoid adding an extra newline.

3 Likes

I always chomp my input, so the first example do not make sense to me. You almost always have to clean up the input data anyway.

also, for the “use two puts, one with no argv” to create a new blank line is not practical for me, either. If you want to print something behind a condition, then two puts statements mean it has to be put in a block, that means 4 lines instead of 1.

to be honest this special-case of puts gives me more annoyance than it being helpful, and I don’t think it should be the default behavior.

1 Like

I don’t think this argument should hold back changes that make the language more consistent or you end up with a mess of exceptions and edge cases.

3 Likes

Removing the special behavior is definitely a breaking change in terms of stdout formatting. That seems like an acceptable area of program behavior to break, in my opinion, when the benefit is language consistency and fixing the broken behavior will usually make the code easier to understand.

3 Likes

This is a big part of it. Ruby being heavily inspired by Perl’s TMTOWTDI to achieve its goal of trying to adapt to the programmer means puts handles both the print and println cases. Since Crystal chooses Python-style “one canonical way to do things”, I agree with removing the special case.

Another part is concurrency-safety. In Ruby, it’s only safe if your string itself ends with a newline. Simple example:

threads = [
  Thread.new { puts "first" },
  Thread.new { puts "second" },
]
threads.each(&:join)

This doesn’t necessarily produce the same output every single time. Do this enough times (especially in a hot loop) and you’ll sometimes end up with both lines on the same line and then 2 newlines in a row (newlines escaped for illustrative purposes):

firstsecond\n\n

However, if you end those strings with newlines, you are guaranteed to get this output every single time:

first\nsecond\n

This is because the write to STDOUT happens in two separate calls: one for the string passed to puts and one for the appended newline. There could be a context switch in between those calls. This also happens in Crystal for the same reason. The IO#<< yields the CPU if it’s doing actual I/O (so for example you can’t reproduce with an IO::Memory, and only sporadically with IO::Buffered implementations), allowing another fiber to be scheduled between the two lines, but if you end the string with a newline that race condition should go away as long as IO#<< is atomic within itself.

That latter part regarding thread safety of extra \n seems to be fixed now though (uncertain if it is in any released version of ruby yet, though), as they have started to place a lock around the file descriptor during writes.

I wish that particular idea would be incorporated into crystal, as I can’t think nonatomic writes to an IO is an acceptable result, ever, and I don’t think the lock would be costly in the uncontended case. But who knows, there may be some cases where the overhead would be too bad.

EDIT: oh wait, it isn’t merged yet: Make write atomic. by ioquatix · Pull Request #5419 · ruby/ruby · GitHub

1 Like

It’s awesome that they’re solving that!

Agreed, at least in theory. Realistically, a lot of wire protocols assume the connection isn’t being shared (such as HTTP/1.1, PostgreSQL, and Redis), so it’ll still work fine even if it’s not atomic, but protocols designed for massive scale often do multiplex a single connection to reduce resource usage (such as HTTP/2 and NATS) and nonatomic I/O will definitely ruin your day there.

Agreed here, too. Since Crystal mutexes are implemented entirely in user space, they’re pretty lightweight. IIRC last time I benchmarked it the additional cost was only ~10ns on average. That might be expensive and unnecessary for some operations, like high-throughput buffered I/O on a dedicated socket, so maybe there could be some ways to bypass safety if you need to optimize, but it’d probably be preferable for most cases, at least as a starting point.

1 Like

Related discussion: puts in a multithreaded environment · Issue #8140 · crystal-lang/crystal · GitHub

1 Like