Implementation of ARC (automatic reference counting) in Crystal

@asterite @straight-shoota @bcardiff @RX14 @ysbaddaden

I am thinking about extra Crystal flexibility and implementing of configurable ARC (automatic reference counting, Rust and Swift are examples of ARC languages).

I am thinking of piggybacking on GC and to implement a visitor that will check the AST and insert acquire and release elements and occasional GC.free on not used reference variables.

Do you think it is feasible?

I want to test execution speed for different GC approaches. I see that -D gc_none is not freeing the memory but may be ARC approach can do it. Or we don’t even need to do acquire and release if we exactly know if value of the variable is passed along the way to the place where we can safely deallocate it.

1 Like

It’s not going to be easy in Crystal since there are so many different ways an object can be referenced (different scopes, objects, etc.)
I did think it might be nice to have an optional “different gc” like reference counted like Python’s (is? used to be?), which I don’t think is the same as ARC but might be convenient in certain cases like https://github.com/crystal-lang/crystal/issues/8788
It “should” be possible you’d just need to write a “traverse” method for “all different objects” and use that for GC…

1 Like

It sounds doable with a lot of work. If you find it fun, I suggest you go ahead and try adding it!

3 Likes

We do variable’s “type painting” anyway, so it is similar. as long as we know that variable is used it will use allocation in new anyway, so we have to find when it goes out of scope completely and add deallocator call. If it is part of array we don’t deallocate it until it was explicitly set to nil or if array is deallocated or reallocated (edge case but it may be the case that some of the variables goes out of scope).

It is lot of fun and I will do it based on the priorities that I allocated to my Crystal projects.
At this moment I want debugging support to be merged into the main branch, then I want to look at scry implementation of VSCode plugin to improve it with debug adapter not to rely on CodeLLDB, when it will work I want to finish implementing my CrMacDbg and make it part of Crystal (Linux and Windows ports will come later) and after this I am planning to work on ARC GC.

3 Likes

One…maybe interesting optimization would be to have crystal analyze methods and if they have objects that are “never shared” deallocate them when the method ends, a la C++ with its local variable destruction. You could do that in combination with the current GC. I have no idea how helpful it would be speedwise, but…might be interesting…you could still only “free” that object, not its internal members in case they were already had a reference added somewhere else though…how does C++ work around this? You always have to copy anything you want to keep after the method ends? Yikes sounds painful…

That being said it might be fun to write a GC “in crystal” (though if you wanted to support MT you’d have to implement some stop the world stuff…).

1 Like

Yes, that was in my mind too. I also want to analyze all allocations and variable assignments and to see where it stops to be reassigned and used so we can add free call at the end of that flow.

This is called escape analysis. It’s very hard to implement. You need to know whether a reference outlives the method scope, for example if you pass it to another method that’s stores that reference.

This priorities sound awesome to me.

Regardind the ARC. At least Swift and obj-c can do ARC thanks to some acquire/release conventions. We don’t enforce them so trying to do a ARC will mean implementing and honoring something similar.

1 Like

They sounds right for me too. :slight_smile: It should unblock lot of people to try Crystal out.

I know that escape analysis is somewhat complicated but we can trick it out. We can do automatic deallocation for variable assignments we can trace and paint but also allow programmer to call explicit free call on variable.
We can have a tool that will show what variables will be deallocated in every function so if developer does not see variable that supposed to be deallocated he can do it explicitly.

PS: This is also my answer to @asterite to his response to me.

1 Like

1st step may be to get a fully-functional version of

  • Immix GC and
  • Immix RC

implemented. Then,

  1. through statics analysis see where one can deallocate through static analysis
  2. Identify GC/RC candidates and apply

As the implementation matures try to catch many more cases through static analysis than GC/RC.

If too many allocate and deallocate calls are made this will also impact performance so they will need to be combined into fewer calls, with region-based allocation and defrag and region deallocation.

2 Likes

In my experiments I want to avoid any GC whatsoever to see if it is possible to improve performance.
Then I will compare my approach against GC approaches to see who is the winner.

But I have some gut feelings that the winner will be the hybrid approach:

  1. Mark regions as deallocated via free calls.
  2. Check what is available memory at this moment:
  3. If it is low then do aggressive memory clean up and defragmentaion (as we have only pointers to the allocated memory we can easily move contents of the memory closer to each other and update stack’s memory pointer to the new memory address. As we are using mostly GEPs, so access to the memory is relative to that pointer so we should be somewhat safe to do it.
  4. If memory is okay then we can do it less aggressively than in item 3 within allocated time limits.

If GC are not allowing it, I will probably experiment with my own then but using ideas of Boehm and Immix during implementaton.

This is, assuming the GC is slow.

Unity seems to be using boehm GC. That’s for 2D and 3D games.

Where can we see the current GC is slow?

3 Likes

Speed isn’t the only factor. If I understand correctly, RC would mean that finalizers for most objects (for all if you will take an extra effort to do not create cyclic links) would be called in a deterministic time. This would allow to use finalize to manage many resources (such as files, sockets, objects allocated by external libs) in RAII fashion. “Method with block” idiom is nice, but RAII gives more flexibility while still protecting user from errors.

1 Like

Ary, I am not saying that it is slow.
I just trust to what I see and test myself to prove it or disprove it.
Second opinion is never hurts, right?
I do have a researcher mindset where I doubt everything until I test it myself and even then I will doubt my own results.
This is the reason why ARC GC is quite low in my priorities as it will take from me substantial amount of time to test and play with it.

1 Like

I believe a hybrid approach would be the winner.

By the way, I am reading through Boehm GC source code and I really like the quality of the code and well written comments that helps to understand what the idea is behind the code.

I think we can play a little with Boehm parameters to tweak it out on Crystal side and do some incremental GC.

4 Likes

I think incremental gc trades efficiency (Mb collected /sec) and average cpu usage to get minimal gc pauses. So it is useful for games, but not for all applications.

I can give yo an example when incremental GC is very beneficial:

I have https://github.com/skuznetsov/java-class-reader.cr library that works very fast initially on a big amount of data, but at some moment when it almost saturates memory it have quite a lot of temporary objects on heap that I am trying to minimize at this moment) it starts to crawl (it is trying to GC some memory and also have effect of swap on it). So if it was incremental GC with assigned quota on it it may work much faster in a long term.

Also early incremental GC is a sign of a good application that cares about resources that are shared with other applications on the computer.

1 Like

Boehm already does incremental doesn’t it?

I do like this idea, of somehow the compiler “knows” if an object is used or not, ex:

def go
  a = [1,2,3]
  go2(a)
end

If it somehow “knows” that go2 doesn’t retain any reference to a then at the end of go it could garbage collect a…