More slices, less alloc?

Noticed this today:

big_string.lines.select{...}

It spends a lot of time doing GC malloc for the Strings within the lines Array.

#5  malloc_atomic () at /usr/share/crystal/src/gc.cr:86                                                                                                                                                      
#6  new () at /usr/share/crystal/src/string.cr:247                                                                                                                                                           
#7  new () at /usr/share/crystal/src/string.cr:212                                                                                                                                                           
#8  new () at /usr/share/crystal/src/string.cr:160                                                                                                                                                           
#9  unsafe_byte_slice_string () at /usr/share/crystal/src/string.cr:5056                                                                                                                                     
#10 lines () at /usr/share/crystal/src/string.cr:3942                                                                                                                                                        
#11 lines () at /usr/share/crystal/src/string.cr:3906                                                                                                                                                        
#12 0x000000000040ac8c in __crystal_main () at mefast.cr:3          

Maybe it could be faster with some concept of just “pointing” to a Slice (i.e. sub slice) within a different (parent) String object?

Has anything like this been tried? I suppose it would still have to malloc something (new String object) but wondering if maybe malloc’ing smaller objects might be quicker (plus not having to copy bytes)?

String#split might also benefit similarly.

Cheers!

Try big_string.each_line.select { ... }. #lines allocates an array of strings and returns it, which you don’t really need to do if all you want to do is filter them.

each_line still allocates a string for each line. Allocation is just spread over the entire iteration sequence instead of all at once in the beginning.
You really only safe on allocation (and possible re-allocation) of the array.

I’m not sure which variant is more pleasant for the garbage collector.

@rogerdpack Do you mean to get a string-like value pointing to a window in the original string? That is, with specific string methods?
Or would an actual Slice do? If the latter, you could just use String#index and String#unsafe_byte_slice to split a string into slices separated by newlines.

I am doing some String operations on it so…I guess String would be more convenient. Still not entirely sure how much pressure off the GC it would take but it might save some RAM for the underlying copy… :)

This is actually the use case for Iterable/Iterator vs Enumerable. Iterables do not create intermediate arrays because they are iterated lazily. Enumerables will eagerly run every block for every element and create a whole new array out of it. You can think of it as iterables are depth-first iteration whereas enumerables are breadth-first.

I believe this is what @Blacksmoke16 was getting at. If you want fewer allocations, what you want is the iterable, not the enumerable.

Maybe it could be faster with some concept of just “pointing” to a Slice (i.e. sub slice) within a different (parent) String object?

Java used to do this optimization. It ended up being problematic for the GC in certain situations as people kept references to the subslices around, but no references to the original string. But the original string couldn’t be freed as there were internal pointers around pointing into it. This made performance and memory usage harder to predict than was nice. So in the end they removed it.

Perhaps it could be an opt-in option for places where it is known to be safe, but I really don’t think it should be the default behaviour.

6 Likes

Right, that’s exactly why we didn’t do it like that.

My idea with Crystal is to have the default API be user-friendly and reasonably efficient, and that if you need extra performance you can still do it, but with a less pleasant API, and where it’s clear that there’s something different going on.

In this case, the type that can share things with pointers is Slice. So maybe String could have an each_line_slice that yields Bytes for each line. Then if you want you can turn those into Strings, or maybe you can avoid this depending on your use case.

Then the problem is that Slice isn’t as powerful as String regarding string-related methods, so that’s another thing we should eventually take care of. For example you can turn a String into an Int32, but you can’t turn Bytes into an Int32, even though it would be really convenient (especially for parsers, to be able to reuse the logic behind parsing ints.)

@rogerdpack In your big_string.lines.select { .... } what is your use case? What do you need to do with each line? And what’s in the .select { ... } block?

2 Likes

That makes sense.
Those are good ideas.

Basic gist was

File.read("big_file").lines.select{ |l| l =~ / \d / && l.split[3] == "xxx"}…

Allocs for the #lines and the #split call (size 8 or so).

Cheers!

Before thinking about advanced optimizations such as slicing into string windows, I’d suggest to catch the low hanging fruit. You can already achieve a great improvement by just using allocation-less APIs.

  • IO#each_line avoids an intermediary array allocation.
  • Even better, File.each_line avoids loading the entire file into memory at once.
  • String#split unecessarily allocates a string for each word (and an array). The yielding overload can avoid some unnecessary allocations (array and words following after the third one).
  • Even better, you could completely avoid alloactions for the “xxx” check with a custom implementation for searching inside the string. If words are separated by space character, you can easily use #index for that. If it can be other white space characters, you’ll probably need a Char::Reader. A regular expression might also work.
1 Like

Those are some good ideas.

One take home was that single_byte_optimizable? seemed to be pretty slow on the newly created strings, maybe it could inherit if from its predecessor?

Also resizing arrays seemed a bit slow (growing from size 0 to 10, once per line), but I haven’t examined the effect yet of either of these…

That already happens in many cases, but probably not in all. Where did you notice the slowness?

That’s weird I can’t seem to repro the slowness on single_byte_optimizable and no longer have access to the code so maybe it has been fixed or I did something wrong. Thanks. :)