What do you think about adding Rails’ String#squish to the standard library?
It’s a pretty useful method to have in specs when you want to compare some string with another string and you don’t care about the exact whitespace (space, newlines, etc.) in the output, you just care about the content (the non-whitespace) part.
These would be the tests for this:
describe "squish" do
it { " ".squish.should eq("") }
it { "a".squish.should eq("a") }
it { "abc".squish.should eq("abc") }
it { " abc ".squish.should eq("abc") }
it { " abc \n\r\t def ".squish.should eq("abc def") }
it { " \n\t abc \n\r\t def \n g h ".squish.should eq("abc def g h") }
end
And this is an example implementation:
class String
def squish : String
reader = Char::Reader.new(self)
# Skip initial whitespace
while reader.current_char.whitespace?
reader.next_char
end
# If we reached the end, we are done
return "" unless reader.has_next?
String.build(bytesize) do |io|
loop do
# Skip over non-whitespace
from_pos = reader.pos
while reader.has_next? && !reader.current_char.whitespace?
reader.next_char
end
# Copy it to the final String
io.write(to_slice[from_pos, reader.pos - from_pos])
# Skip whitespace
while reader.current_char.whitespace?
reader.next_char
end
# If we reached the end, no need to append the trailing whitespace
break unless reader.has_next?
# Append a single whitespace
io << ' '
end
end
end
end
13 Likes
Looks handy, but imo a bit too specific for standard lib (but would never mind it, if it was added anyway - it’s just something I would never expect, and I therefor probably would not even check standard lib for even if it was provided).
I’d like more flexibility, for example joining sequences of newlines. This could be achieved easily by providing the matcher as block argument and replacement char as regular arg.
I’m wondering about the similarity to #squeeze
. Semantically, this would be equivalent to squeeze(&.whitespace?)
with a custom fixed replacement character (
).
Perhaps we could consider to merge the implementation into that method? Then you could do squeeze(' ', &.whitespace?)
. It might still be available as a separate method #squish
(#squeeze
without arguments is already taken).
2 Likes
Yeah, I thought about that, but squish also removes leading and trailing whitespace, which squeeze doesn’t do.
1 Like
I like it, although I think it could be better with a few options.
- Control what the singular white space is (space, tab, newline).
- Allow keeping linebreaks, but removing blank lines.
2 Likes
I’m able to pass all testcases with the following snippet
class String
def squish
gsub(/\s+/, " ").strip
end
end
1 Like
Great! Can you run a benchmark between the two alternatives to see which one is faster?
I popped them both into a quick benchmark, which checks both shorter and slightly longer strings, and ran it on macOS on both Intel and ARM CPUs:
Benchmark code
require "benchmark"
string = nil
short = " abc def "
long = <<-STRING
foo
bar baz omg
asdf lasjkdbflk flk l kl a sdfklhj kljh lk laksdjf laksdfj laksd jflkas djflk df
STRING
puts "Short strings"
Benchmark.ips do |x|
s = short
x.report "WintereDesert" { string = s.squish_wintere_desert }
x.report "ary's" { string = s.squish_ary }
end
puts
puts "Long strings"
Benchmark.ips do |x|
s = long
x.report "WintereDesert" { string = s.squish_wintere_desert }
x.report "ary's" { string = s.squish_ary }
end
# This won't do anything, but LLVM can't tell that so it won't optimize it out
pp string unless string
class String
def squish_ary : String
reader = Char::Reader.new(self)
# Skip initial whitespace
while reader.current_char.whitespace?
reader.next_char
end
# If we reached the end, we are done
return "" unless reader.has_next?
String.build(bytesize) do |io|
loop do
# Skip over non-whitespace
from_pos = reader.pos
while reader.has_next? && !reader.current_char.whitespace?
reader.next_char
end
# Copy it to the final String
io.write(to_slice[from_pos, reader.pos - from_pos])
# Skip whitespace
while reader.current_char.whitespace?
reader.next_char
end
# If we reached the end, no need to append the trailing whitespace
break unless reader.has_next?
# Append a single whitespace
io << ' '
end
end
end
def squish_wintere_desert
gsub(/\s+/, " ").strip
end
end
Results
The simpler solution is definitely easier to read, but takes 2.6-5x as long, likely due to allocating 2-5x more heap memory.
Intel
Short strings
WintereDesert 2.75M (364.26ns) (± 0.67%) 272B/op 2.94× slower
ary's 8.08M (123.69ns) (± 0.68%) 128B/op fastest
Long strings
WintereDesert 495.95k ( 2.02µs) (± 1.06%) 1.0kB/op 2.65× slower
ary's 1.31M (761.10ns) (± 2.88%) 224B/op fastest
M1/ARM
Short strings
WintereDesert 2.66M (376.07ns) (± 0.67%) 272B/op 4.75× slower
ary's 12.63M ( 79.19ns) (± 2.92%) 128B/op fastest
Long strings
WintereDesert 380.66k ( 2.63µs) (± 0.93%) 1.0kB/op 5.04× slower
ary's 1.92M (520.83ns) (± 0.82%) 224B/op fastest
11 Likes