So I have had a few ides that have fallen flat so take this with a gain of salt.
What if some string methods did not return strings but a struct with pointers into strings. Strings are immutable so there should not be case were an existing string would change and invalidate a view. Consider the code below.
struct StringView
def initialize(
@buffer : Pointer(UInt8),
@start : Int32,
@size : Int32
)
end
def to_s(io : IO) : Nil
io.write_string(Bytes.new(@size) { |i|
(@buffer + i).value.to_u8
})
end
def [](index) : Char
@buffer[index].unsafe_chr
end
end
Some String methods could use what I called string views which are pointers into strings. This allocates a few ints on the stack instead of allocating new strings on the stack. Here is an example of the mutations to the string class.
class String
def byte_view(start : Int, count : Int) : StringView
StringView.new(to_unsafe, start, count)
end
def chomp_view
return StringView.new(to_unsafe, 0, 0) if empty?
case to_unsafe[bytesize - 1]
when '\n'
if bytesize > 1 && to_unsafe[bytesize - 2] === '\r'
StringView.new(to_unsafe, 0, bytesize - 2)
else
StringView.new(to_unsafe, 0, bytesize - 1)
end
when '\r'
StringView.new(to_unsafe, 0, bytesize - 1)
else
StringView.new(to_unsafe, 0, bytesize)
end
end
NEWLINE_U8 = '\n'.ord.to_u8
RETURN_CHAR = '\r'
def each_line_view(chomp = true, &block : StringView ->) : Nil
# return if empty?
offset = 0
while byte_index = byte_index(NEWLINE_U8, offset)
count = byte_index - offset + 1
if str = chomp_view
count -= 1
count -= 1 if offset + count > 0 && str[offset + count - 1] === RETURN_CHAR
end
yield StringView.new(to_unsafe, offset, count)
offset = byte_index + 1
end
StringView.new(to_unsafe, 0, 0) unless offset == bytesize
end
def strip_view
excess_left = calc_excess_left
if excess_left == bytesize
return StringView.new(to_unsafe, 0, 0)
end
excess_right = calc_excess_right
StringView.new(
to_unsafe,
excess_left,
excess_left + excess_right
)
end
end
From some simple benchmarks I made from the sample code from crystal-lang.org it seems like there are performance advantages from 2X to 6X. What am I missing???
here is a link to some sample code.
I know I am missing something.