I am wondering if there’s an elegant way to solve this.
The standard way for implementing a graph class with arbitrary node attributes probably is generics (variant #1 below).
(It just has a bare minimum, not useful interface, I know).
But this implementation has the drawback that some algorithms working on it cannot define their own data structures for the graph, e.g. a is_node_visited?.
Variant #1: Graph class with internalized node attributes:
class Graph(T)
def initialize
@nodes = Set(Node(T)).new
end
def add_node : Node(T)
n = Node(T).new
@nodes.add(n)
n
end
def remove_node(node : Node(T))
@nodes.delete(node) # deletes Node along with instance data (obviously)
end
class Node(T)
property value = T.new
end
end
g = Graph(String).new
n = g.add_node
n.value = "foo"
p n.value
g.remove_node(n)
On the other hand, if we remove the node attributes entirely from the graph class and use a generic hash, we will run in memory leaks, if the attribute (and graph) live long and are growing and shrinking a lot - since the hash will always keep referencing the old nodes.
Variant #2: Graph class with externalized node attributes:
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node)
@nodes.delete(node) # just deletes the structure information, attributes are outside...
end
class Node
end
end
g = Graph.new
node_attrib = Hash(Graph::Node,String).new # generates a memory leak (when attribute lives long)
n = g.add_node
node_attrib[n] = "foo"
p node_attrib[n]
g.remove_node(n) # generates a memory leak in node_attrib (if attribute lives long)
Is there a way of defining a Hash-like data structure that is just capable of #[] and #[]=, but not able to enumerate its elements, hence not referencing them, so that the GC can do the cleanup all by itself?
Or is there any other elegant way (I know, one can always do some sort of reference counting on his own, but this seems very tedious)?
I think you’re looking for a WeakRef. With this you can make a WeakMap (name stolen from the same concept in Ruby). You can iterate over a WeakMap and, if the object has been GCed, it just won’t yield it:
class Person
def initialize(@name : String)
end
end
map = WeakMap(String, Person).new
map["jgaskins"] = Person.new("Jamie")
# Hold onto a reference to the Person object so it isn't GCed
wolfgang = Person.new("Wolfgang")
map["wolfgang371"] = wolfgang
pp map
# WeakMap(String,Person) {
# "jgaskins" => #<Person:0x1032f0e60 @name="Jamie">,
# "wolfgang371" => #<Person:0x1032f0e00 @name="Wolfgang">}
GC.collect
pp map
# WeakMap(String,Person) { "wolfgang371" => #<Person:0x1032f0e00 @name="Wolfgang">}
Notice that my entry was removed because nothing was referencing it other than the WeakMap itself, but since yours was stored in a variable, it was kept.
The implementation is here:
require "weak_ref"
class WeakMap(K, V)
include Enumerable({K, V})
@hash : Hash(K, WeakRef(V)) = Hash(K, WeakRef(V)).new
def []=(key : K, value : V)
@hash[key] = WeakRef.new(value)
end
def []?(key : K)
if ref = @hash[key]?
if value = ref.value
value
else
@hash.delete key
nil
end
end
end
def delete(key : K)
@hash.delete key
end
def each
@hash.each do |(key, ref)|
if value = ref.value
yield({key, value})
else
delete key
end
end
end
def pretty_print(pp)
prefix = "WeakMap(#{K},#{V}) {"
executed = exec_recursive(:pretty_print) do
pp.surround(prefix, "}", left_break: nil, right_break: nil) do
each_with_index do |(key, value), index|
if index == 0
pp.breakable
else
pp.comma
end
pp.group do
key.pretty_print(pp)
pp.text " => "
pp.nest do
pp.breakable ""
value.pretty_print(pp)
end
end
end
end
end
unless executed
pp.text "#{prefix} ...>"
end
end
end
A problem with that implementation is that if the only reference to a value comes from a key in the same WeakMap, that value can never be collected:
class Name
end
class Person
def initialize(@name : Name)
end
end
def foo
name = Name.new
# if either `name` below is replaced with `Name.new`,
# then the key-value pair gets collected successfully
WeakMap{Person.new(name) => name}
end
map = foo
pp map # => WeakMap(Person,Name) { #<Person:0x... @name=#<Name:0x...>> => #<Name:0x...>}
GC.collect
pp map # => WeakMap(Person,Name) { #<Person:0x... @name=#<Name:0x...>> => #<Name:0x...>}
thanks for your answer, I really wasn’t aware about WeakRef!
However, there are two catches:
First, in your case the GC.collect only works, since you effectively call each when pretty printing, just before. It still works, if you call pp map.size instead, but it doesn’t work anymore, if you add a method x to WeakMap that returns @hash.size and then call pp map.x instead.
Second, for my use case, e.g. is_node_visited? I actually need the key to be a WeakRef, somehow - not the value…
I threw that code together to get you started. It was not meant to be a complete solution that covers every edge case. If you need something more robust, you will need to make that happen.
This definitely makes it more challenging and requires wrapping every key in a WeakRef. This requires making WeakMap#[]?(key : K) then also wrap the given key in a WeakRef, which then in turn requires overriding Reference#hash(hasher) and Reference#==(other) because WeakRef does not. You can do that by subclassing WeakRef and using that as your key type so you don’t need to monkeypatch. For example (again, this is for illustrative purposes and may or may not even work as written):
class WeakMap(K,V)
private class Key(T) < WeakRef(T)
def hash(hasher)
value.hash hasher
end
def ==(other : Key(T))
value == other.value
end
end
end
For my first point, I wasn’t expecting a complete solution - I was trying to say that the GC can only do the work if we manually walk all WeakMap elements. I don’t see a robust way of doing so - or is there a way to tell the GC he has to call some methods before actually doing it’s work?
For your second answer: I will definitely have a closer look into it, that might solve a (big) part of the problem.
However, when I was just playing around a bit, it seems that the WeakRef demo in the docs doesn’t work for me (anymore?)… (I’m using Crystal 1.7.3)
require "weak_ref"
ref = WeakRef.new("oof".reverse)
p ref.value # => "foo"
GC.collect
p ref.value # => nil, according to docs, but now I do get "foo"
ref = WeakRef.new("oof".reverse) # reverse is also necessary, otherwise it will reference "oof", which is still in scope
p ref # # => "foo" - don't call .value, because then it'll stay on the stack, hence will not get GC'ed
GC.collect
p ref # ... @target=Pointer(Void).null
p ref.value # .value here is safe => nil
do you have a clue why this is not working?
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
p n
g.remove_node(n)
GC.collect
p n # should print ... @target=Pointer(Void).null
I tried to use a manual wrapping of WeakRef and also used #value carefully, but it doesn’t get freed.
I think there might be something in Hash representation that keeps a potential reference.
I did a trivial and ugly implementation based on array and nodes get freed after a second GC.collect
require "weak_ref"
class Graph
class Entry
property strong : InternalNode
property weak : Node
def_equals_and_hash weak
def initialize(@strong, @weak)
end
end
def initialize
@nodes = Array(Entry).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes << Entry.new(ni, n)
n
end
def remove_node(node : Node) : Nil
ni = uninitialized InternalNode
@nodes.delete(Entry.new(ni, node))
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
p n
g.remove_node(n)
GC.collect
p n
GC.collect
p n
For the hash I tried to check if things where cleared in the @entries and it seems so. With the following you get a memory representation of the entries, which are the ones that eventually holds references to the keys AFAIK
Thanks for your investigations. The Array version works for me as well now.
I had another look into the Hash version.
Before I posted yesterday, I already tried if several GC.collects actually help - they didn’t.
However, if I change the last three lines from…
g.remove_node(n)
GC.collect
p n # should print ... @target=Pointer(Void).null
to
g.remove_node(n)
p n # should print ... @target=Pointer(Void).null
GC.collect
p n # should print ... @target=Pointer(Void).null
the last output actually is null.
It seems accessing the WeakRef between destroying the object and GC’ing is important.
As a result this code runs fine:
require "weak_ref"
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
node.inspect # necessary, otherwise very last line does not yield "nil"
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
g.remove_node(n)
GC.collect
p n.value # -> nil, if above "node.inspect" is in place
As for the WeakMap on the keys in the hash I now tried the following:
require "weak_ref"
class WeakMap(K, V)
private class Key(T) < WeakRef(T)
def_equals_and_hash value # identical to commented out block
# def hash(hasher)
# value.hash hasher
# end
# def ==(other : Key(T))
# value == other.value
# end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K)
@hash[Key.new(key)]?
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
p ["each", key, key.value, value]
if key.value
yield({key, value})
else
p "never called"
@hash.delete(key)
end
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
g = Graph.new
node_attrib = WeakMap(Graph::Node,String).new
n = g.add_node
node_attrib[n] = "foo"
g.remove_node(n)
node_attrib.each {}
n = g.add_node
node_attrib[n] = "bar"
GC.collect
node_attrib.each {}
GC.collect
p node_attrib.size # still 2
Now I have two versions, that are pretty much comparable, side-by-side: a specialized one that works and a generic one that doesn’t work.
Working specialized:
require "weak_ref"
class WeakMap(K,V)
include Enumerable({K,V})
@hash = Hash(K,V).new
def []=(key : K, value : V)
@hash[key] = value
end
def []?(key : K)
@hash[key]?
end
def delete(key : K)
@hash.delete(key)
end
def each
@hash.each do |(key, value)|
yield({key, value})
end
end
def clean
@hash.each_key do |k|
@hash.delete(k) if !k.value
end
end
end
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
node.inspect # on weak type; necessary, otherwise very last line does not yield "nil"
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
def fn(g, h)
n = g.add_node
h[n] = "foo"
g.remove_node(n)
end
g = Graph.new
h = WeakMap(Graph::Node, String).new
fn(g,h)
GC.collect
h.clean
p h.size # is 0
Broken generic:
require "weak_ref"
class WeakMap(K, V)
private class Key(T) < WeakRef(T)
def_equals_and_hash value # identical to commented out block
# def hash(hasher)
# value.hash hasher
# end
# def ==(other : Key(T))
# value == other.value
# end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K)
@hash[Key.new(key)]?
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
yield({key, value})
end
end
def clean
@hash.each_key do |k|
@hash.delete(k) if !k.value
end
end
end
class Graph
def initialize
@nodes = Hash(Node,Bool).new
end
def add_node : Node
n = Node.new
@nodes[n] = true
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
node.inspect # on strong type?!
nil
end
class Node
end
end
def fn(g, h)
n = g.add_node
h[n] = "foo"
g.remove_node(n)
end
g = Graph.new
h = WeakMap(Graph::Node, String).new
fn(g,h)
GC.collect
h.clean
p h.size # should be 0, but is 1
ok, this time both memory management and semantics play
(sorry, just had to delete two stupid posts of mine)
require "weak_ref"
class WeakKeyMap(K, V)
# precondition (1): do _not_ subclass WeakRef(T)
private class Key(T)
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T)) # this is important for semantics
value == other.value
end
# precondition (2): do _not_ use this
# def_equals_and_hash @k.value
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
# precondition (3a): remove keys regularly
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
# precondition (3b): remove keys regularly
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
My last “solution” doesn’t work on more complex data structures (not included here).
This is obviously due to my missing hash(hasher) method. If I include it, the semantics are fine, but the memory leak is back.
Now, even if I try to trick the GC as in the following code, I cannot find a way to do so - it even seems to be independent of the offset I add (0x1234… wouldn’t be helping anyhow)…
require "weak_ref"
# trying to work around the GC
lib C
# in C: void *memcpy(void *dest, const void *src, size_t n);
fun memcpy(dest : UInt64*, src : UInt64*, n : UInt32)
end
class WeakKeyMap(K, V)
# precondition (1): do _not_ subclass WeakRef(T)
private class Key(T)
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T)) # this is important for semantics
value == other.value
end
def hash(hasher)
v1 = @k.value.object_id
v2 = 0u64
C.memcpy(pointerof(v2), pointerof(v1), 4) # trying to work around GC logic
v2 += 0x1234567812345678u64 # doesn't remove memory leak
# v2 = 0u64 # uncommenting removes memory leak
v2.hash(hasher)
end
# precondition (2): do _not_ use this
# def_equals_and_hash @k.value
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
# precondition (3a): remove keys regularly
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
# precondition (3b): remove keys regularly
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
After some more debugging it turned out that it’s not the GC’s fault, but the Key(T) has an issue.
The hash method just calculates different hashes for the keys before and after GC’ing, so it never succeeds removing the keys. Now I cache the latest Hasher and this seems to work for all my examples.
require "weak_ref"
class WeakKeyMap(K, V)
# precondition: do _not_ subclass WeakRef(T)
private class Key(T)
@hasher = Crystal::Hasher.new
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T))
value == other.value
end
def hash(hasher)
# this is crucial, since after GC is setting .value to nil, the hashes are different and deletion doesn't work anymore
if @k.value
@hasher = @k.value.hash(hasher)
else
@hasher
end
end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
Storing the entire Hasher value is not a good idea. This retains information about data outside this particular Key instance. The value of the hasher argument is derived from anything that was hashed before this particular Key instance. As part of a larger structure that is hashed as a whole, this practice prevents changes in anything that was hashed before a Key instance to embody. So the hash information would be wrong.
A better solution is to store a sub hash which only covers the key value.
def hash(hasher)
if value = @k.value
key_hasher = value.hash(Crystal::Hasher.new)
@hasher = key_hasher
else
key_hasher = @hasher
end
key_hasher.hash(hasher)
end
require "weak_ref"
class WeakKeyMap(K, V)
private class Key(T)
@hasher = Crystal::Hasher.new
def initialize(k : T)
@k = WeakRef(T).new(k) # using WeakRef(T) because subclassing doesn't work
end
def value
@k.value
end
def ==(other : Key(T))
value == other.value
end
def hash(hasher)
if value = @k.value
key_hasher = value.hash(Crystal::Hasher.new)
@hasher = key_hasher
else
# this is crucial, since after GC is setting .value to nil, the hashes are different and deletion doesn't work anymore
key_hasher = @hasher
end
key_hasher.hash(hasher)
end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
The only (minor) drawback is the ugly purging of the deleted keys. GC (post) hooks would be fine here, but monkey-patching doesn’t help.
Sorry, I just noticed the final expression in my previous comment was wrong: hasher.hash(key_hasher). It needs to be the other way round: key_hasher.hash(hasher).
Regarding the GC having no hooks, there is the undocumented GC.before_collect method you can pass a block that will execute … before the collection. So it will be one off in the collection cycle if you do clean ups there.