thanks for your answer, I really wasn’t aware about WeakRef!
However, there are two catches:
First, in your case the GC.collect only works, since you effectively call each when pretty printing, just before. It still works, if you call pp map.size instead, but it doesn’t work anymore, if you add a method x to WeakMap that returns @hash.size and then call pp map.x instead.
Second, for my use case, e.g. is_node_visited? I actually need the key to be a WeakRef, somehow - not the value…
I threw that code together to get you started. It was not meant to be a complete solution that covers every edge case. If you need something more robust, you will need to make that happen.
This definitely makes it more challenging and requires wrapping every key in a WeakRef. This requires making WeakMap#[]?(key : K) then also wrap the given key in a WeakRef, which then in turn requires overriding Reference#hash(hasher) and Reference#==(other) because WeakRef does not. You can do that by subclassing WeakRef and using that as your key type so you don’t need to monkeypatch. For example (again, this is for illustrative purposes and may or may not even work as written):
class WeakMap(K,V)
private class Key(T) < WeakRef(T)
def hash(hasher)
value.hash hasher
end
def ==(other : Key(T))
value == other.value
end
end
end
For my first point, I wasn’t expecting a complete solution - I was trying to say that the GC can only do the work if we manually walk all WeakMap elements. I don’t see a robust way of doing so - or is there a way to tell the GC he has to call some methods before actually doing it’s work?
For your second answer: I will definitely have a closer look into it, that might solve a (big) part of the problem.
However, when I was just playing around a bit, it seems that the WeakRef demo in the docs doesn’t work for me (anymore?)… (I’m using Crystal 1.7.3)
require "weak_ref"
ref = WeakRef.new("oof".reverse)
p ref.value # => "foo"
GC.collect
p ref.value # => nil, according to docs, but now I do get "foo"
ref = WeakRef.new("oof".reverse) # reverse is also necessary, otherwise it will reference "oof", which is still in scope
p ref # # => "foo" - don't call .value, because then it'll stay on the stack, hence will not get GC'ed
GC.collect
p ref # ... @target=Pointer(Void).null
p ref.value # .value here is safe => nil
do you have a clue why this is not working?
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
p n
g.remove_node(n)
GC.collect
p n # should print ... @target=Pointer(Void).null
I tried to use a manual wrapping of WeakRef and also used #value carefully, but it doesn’t get freed.
I think there might be something in Hash representation that keeps a potential reference.
I did a trivial and ugly implementation based on array and nodes get freed after a second GC.collect
require "weak_ref"
class Graph
class Entry
property strong : InternalNode
property weak : Node
def_equals_and_hash weak
def initialize(@strong, @weak)
end
end
def initialize
@nodes = Array(Entry).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes << Entry.new(ni, n)
n
end
def remove_node(node : Node) : Nil
ni = uninitialized InternalNode
@nodes.delete(Entry.new(ni, node))
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
p n
g.remove_node(n)
GC.collect
p n
GC.collect
p n
For the hash I tried to check if things where cleared in the @entries and it seems so. With the following you get a memory representation of the entries, which are the ones that eventually holds references to the keys AFAIK
Thanks for your investigations. The Array version works for me as well now.
I had another look into the Hash version.
Before I posted yesterday, I already tried if several GC.collects actually help - they didn’t.
However, if I change the last three lines from…
g.remove_node(n)
GC.collect
p n # should print ... @target=Pointer(Void).null
to
g.remove_node(n)
p n # should print ... @target=Pointer(Void).null
GC.collect
p n # should print ... @target=Pointer(Void).null
the last output actually is null.
It seems accessing the WeakRef between destroying the object and GC’ing is important.
As a result this code runs fine:
require "weak_ref"
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
node.inspect # necessary, otherwise very last line does not yield "nil"
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
g = Graph.new
n = g.add_node
g.remove_node(n)
GC.collect
p n.value # -> nil, if above "node.inspect" is in place
As for the WeakMap on the keys in the hash I now tried the following:
require "weak_ref"
class WeakMap(K, V)
private class Key(T) < WeakRef(T)
def_equals_and_hash value # identical to commented out block
# def hash(hasher)
# value.hash hasher
# end
# def ==(other : Key(T))
# value == other.value
# end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K)
@hash[Key.new(key)]?
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
p ["each", key, key.value, value]
if key.value
yield({key, value})
else
p "never called"
@hash.delete(key)
end
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
g = Graph.new
node_attrib = WeakMap(Graph::Node,String).new
n = g.add_node
node_attrib[n] = "foo"
g.remove_node(n)
node_attrib.each {}
n = g.add_node
node_attrib[n] = "bar"
GC.collect
node_attrib.each {}
GC.collect
p node_attrib.size # still 2
Now I have two versions, that are pretty much comparable, side-by-side: a specialized one that works and a generic one that doesn’t work.
Working specialized:
require "weak_ref"
class WeakMap(K,V)
include Enumerable({K,V})
@hash = Hash(K,V).new
def []=(key : K, value : V)
@hash[key] = value
end
def []?(key : K)
@hash[key]?
end
def delete(key : K)
@hash.delete(key)
end
def each
@hash.each do |(key, value)|
yield({key, value})
end
end
def clean
@hash.each_key do |k|
@hash.delete(k) if !k.value
end
end
end
class Graph
def initialize
@nodes = Hash(InternalNode, Node).new
end
def add_node : Node
ni = InternalNode.new
n = Node.new(ni)
@nodes[ni] = n
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node.value)
node.inspect # on weak type; necessary, otherwise very last line does not yield "nil"
nil
end
private class InternalNode
end
alias Node = WeakRef(InternalNode)
end
def fn(g, h)
n = g.add_node
h[n] = "foo"
g.remove_node(n)
end
g = Graph.new
h = WeakMap(Graph::Node, String).new
fn(g,h)
GC.collect
h.clean
p h.size # is 0
Broken generic:
require "weak_ref"
class WeakMap(K, V)
private class Key(T) < WeakRef(T)
def_equals_and_hash value # identical to commented out block
# def hash(hasher)
# value.hash hasher
# end
# def ==(other : Key(T))
# value == other.value
# end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K)
@hash[Key.new(key)]?
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
yield({key, value})
end
end
def clean
@hash.each_key do |k|
@hash.delete(k) if !k.value
end
end
end
class Graph
def initialize
@nodes = Hash(Node,Bool).new
end
def add_node : Node
n = Node.new
@nodes[n] = true
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
node.inspect # on strong type?!
nil
end
class Node
end
end
def fn(g, h)
n = g.add_node
h[n] = "foo"
g.remove_node(n)
end
g = Graph.new
h = WeakMap(Graph::Node, String).new
fn(g,h)
GC.collect
h.clean
p h.size # should be 0, but is 1
ok, this time both memory management and semantics play
(sorry, just had to delete two stupid posts of mine)
require "weak_ref"
class WeakKeyMap(K, V)
# precondition (1): do _not_ subclass WeakRef(T)
private class Key(T)
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T)) # this is important for semantics
value == other.value
end
# precondition (2): do _not_ use this
# def_equals_and_hash @k.value
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
# precondition (3a): remove keys regularly
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
# precondition (3b): remove keys regularly
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
My last “solution” doesn’t work on more complex data structures (not included here).
This is obviously due to my missing hash(hasher) method. If I include it, the semantics are fine, but the memory leak is back.
Now, even if I try to trick the GC as in the following code, I cannot find a way to do so - it even seems to be independent of the offset I add (0x1234… wouldn’t be helping anyhow)…
require "weak_ref"
# trying to work around the GC
lib C
# in C: void *memcpy(void *dest, const void *src, size_t n);
fun memcpy(dest : UInt64*, src : UInt64*, n : UInt32)
end
class WeakKeyMap(K, V)
# precondition (1): do _not_ subclass WeakRef(T)
private class Key(T)
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T)) # this is important for semantics
value == other.value
end
def hash(hasher)
v1 = @k.value.object_id
v2 = 0u64
C.memcpy(pointerof(v2), pointerof(v1), 4) # trying to work around GC logic
v2 += 0x1234567812345678u64 # doesn't remove memory leak
# v2 = 0u64 # uncommenting removes memory leak
v2.hash(hasher)
end
# precondition (2): do _not_ use this
# def_equals_and_hash @k.value
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
# precondition (3a): remove keys regularly
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
# precondition (3b): remove keys regularly
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
After some more debugging it turned out that it’s not the GC’s fault, but the Key(T) has an issue.
The hash method just calculates different hashes for the keys before and after GC’ing, so it never succeeds removing the keys. Now I cache the latest Hasher and this seems to work for all my examples.
require "weak_ref"
class WeakKeyMap(K, V)
# precondition: do _not_ subclass WeakRef(T)
private class Key(T)
@hasher = Crystal::Hasher.new
def initialize(k : T)
@k = WeakRef(T).new(k)
end
def value
@k.value
end
def ==(other : Key(T))
value == other.value
end
def hash(hasher)
# this is crucial, since after GC is setting .value to nil, the hashes are different and deletion doesn't work anymore
if @k.value
@hasher = @k.value.hash(hasher)
else
@hasher
end
end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
class Graph
def initialize
@nodes = Set(Node).new
end
def add_node : Node
n = Node.new
@nodes.add(n)
n
end
def remove_node(node : Node) : Nil
@nodes.delete(node)
nil
end
class Node
end
end
def fn(g, h)
n1 = g.add_node
h[n1] = "foo"
raise "fail" if h[n1]!="foo"
g.remove_node(n1)
n2 = g.add_node
h[n2] = "bar"
raise "fail" if h[n2]!="bar"
g.remove_node(n2)
n3 = g.add_node
h[n3] = "foo1"
raise "fail" if h[n3]!="foo1"
g.remove_node(n3)
n4 = g.add_node
h[n4] = "foo2"
raise "fail" if h[n4]!="foo2"
g.remove_node(n4)
n5 = g.add_node
h[n5] = "foo3"
raise "fail" if h[n5]!="foo3"
g.remove_node(n5)
n6 = g.add_node
h[n6] = "foo4"
raise "fail" if h[n6]!="foo4"
g.remove_node(n6)
end
g = Graph.new
h = WeakKeyMap(Graph::Node, String).new
fn(g,h)
p h.size # 6
GC.collect
4.times {Fiber.yield}
p h.size # 2
2.times {Fiber.yield}
p h.size # 0
Storing the entire Hasher value is not a good idea. This retains information about data outside this particular Key instance. The value of the hasher argument is derived from anything that was hashed before this particular Key instance. As part of a larger structure that is hashed as a whole, this practice prevents changes in anything that was hashed before a Key instance to embody. So the hash information would be wrong.
A better solution is to store a sub hash which only covers the key value.
def hash(hasher)
if value = @k.value
key_hasher = value.hash(Crystal::Hasher.new)
@hasher = key_hasher
else
key_hasher = @hasher
end
key_hasher.hash(hasher)
end
require "weak_ref"
class WeakKeyMap(K, V)
private class Key(T)
@hasher = Crystal::Hasher.new
def initialize(k : T)
@k = WeakRef(T).new(k) # using WeakRef(T) because subclassing doesn't work
end
def value
@k.value
end
def ==(other : Key(T))
value == other.value
end
def hash(hasher)
if value = @k.value
key_hasher = value.hash(Crystal::Hasher.new)
@hasher = key_hasher
else
# this is crucial, since after GC is setting .value to nil, the hashes are different and deletion doesn't work anymore
key_hasher = @hasher
end
key_hasher.hash(hasher)
end
end
include Enumerable({Key(K), V})
@hash = Hash(Key(K), V).new
def initialize
# since...
# - GC has no hooks for cleaning
# - Hash iterators are all private, so we cannot store the iterator state here
# ... we have to use fibers
spawn do
while true
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
Fiber.yield
end
Fiber.yield
end
end
end
def []=(key : K, value : V)
@hash[Key.new(key)] = value
end
def []?(key : K) : V|Nil
@hash[Key.new(key)]?
end
def [](key : K) : V
self[key]?.as(V) # V may or may not include Nil
end
def delete(key : K)
@hash.delete(Key.new(key))
end
def each
@hash.each do |(key, value)|
@hash.delete(key) if !key.value
yield({key, value})
end
end
end
The only (minor) drawback is the ugly purging of the deleted keys. GC (post) hooks would be fine here, but monkey-patching doesn’t help.
Sorry, I just noticed the final expression in my previous comment was wrong: hasher.hash(key_hasher). It needs to be the other way round: key_hasher.hash(hasher).
Regarding the GC having no hooks, there is the undocumented GC.before_collect method you can pass a block that will execute … before the collection. So it will be one off in the collection cycle if you do clean ups there.
Thanks, the “one off” shouldn’t really hurt, you’re right.
But it doesn’t seem to work for me (Crystal 1.8.1 on Ubuntu)…
class X
def initialize
GC.before_collect do
end
end
end
X.new
GC.collect
… leads to…
Stack overflow (e.g., infinite or very deep recursion)
[0x55c2e81c47c6] *Exception::CallStack::print_backtrace:Nil +118 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e81b3ff6] ~procProc(Int32, Pointer(LibC::SiginfoT), Pointer(Void), Nil) +310 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x7fdd94d39980] ?? +140589661395328 in /lib/x86_64-linux-gnu/libpthread.so.0
[0x55c2e81b13c1] ~procProc(Nil) +17 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e81b14c7] ~procProc(Nil) +279 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp (140587164552344 times)
[0x55c2e823dd10] GC_mark_some +1392 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e823693d] ?? +94295606651197 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e8236719] GC_try_to_collect_inner +329 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e82374a8] ?? +94295606654120 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e823751a] GC_gcollect +10 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e81d9346] *GC::collect:Nil +6 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e81a3d2b] __crystal_main +1051 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e8235c96] *Crystal::main_user_code<Int32, Pointer(Pointer(UInt8))>:Nil +6 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e8235c0a] *Crystal::main<Int32, Pointer(Pointer(UInt8))>:Int32 +58 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x55c2e81b12d6] main +6 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
[0x7fdd940e2c87] __libc_start_main +231 in /lib/x86_64-linux-gnu/libc.so.6
[0x55c2e81a382a] _start +42 in .../snap/crystal/common/.cache/crystal/crystal-run-gc.tmp
Hm… right, sorry. GC.before_collect might not be ready for this. It seems it does not accept an arbitrary number of calls, only the one that is currently in place. Procs passed end up calling itself because of limitation on how before_collect is currently written to avoid generating closures.
If we need to hook on before collect we need to offer a better API for it.
it just occurred to me that this is also the proper general (long-living) caching data structure (vs. Hash).
Because the GC cleans the cache if the objects are no longer existing.
At least for class instance related data…
We should just remove any enumeration-/indexing-like methods from this class (can lead to non-deterministic behavior), to have it just serve as a mapper.