Compound hash keys with `compare_by_identity` members

HertzDevil · May 25, 2023, 3:17pm

Let’s say a reference type defines its own equality and hash methods:

class Foo
  property x : Int32

  def initialize(@x)
  end
  
  def_equals_and_hash x
end

If we want a Hash(Foo, _) to consider only object identity and ignore key equality, we could use Hash#compare_by_identity:

foo1 = Foo.new(1)
foo2 = Foo.new(1)

hash = {} of Foo => String
hash.compare_by_identity
hash[foo1] = "a"
hash[foo2] = "b"
hash[foo1] = "c"
hash # => {#<Foo:0x7fdbd05dce70 @x=1> => "c", #<Foo:0x7fdbd05dce60 @x=1> => "b"}

This no longer works when the hash key is a compound type that doesn’t define object identity, for example Tuple, because Hash falls back to key equality on keys that don’t respond to #object_id:

hash = {} of {Int32, Foo} => String
hash.compare_by_identity
hash[{3, foo1}] = "a"
hash[{3, foo2}] = "b"
hash[{3, foo1}] = "c"
hash # => {{3, #<Foo:0x7fb60bc76e70 @x=1>} => "c"}

We definitely don’t want Tuple#object_id here because not all Tuples need object identity, and we don’t want special Tuple support in Hash itself either, since Tuple keys do not necessarily compare all of their elements by identity (imagine a Tuple(Foo, Foo) key that compares the two elements by equality and identity respectively). Finally, we want to avoid using Foo#object_id as part of the key, since going from the key to the actual Foo now requires a rather unsafe Pointer(Void).new(key).as(Foo) or some other redundant mechanism.

The solution, it seems, is to use a custom record whose equality respects Foo’s identity:

record FooKey, i : Int32, foo : Foo do
  # note: redefines what is already provided by the `record` macro
  def_equals_and_hash i, foo.object_id
end

hash = {} of FooKey => String
# note: `hash.compare_by_identity` not needed anymore
hash[FooKey.new(3, foo1)] = "a"
hash[FooKey.new(3, foo2)] = "b"
hash[FooKey.new(3, foo1)] = "c"
hash # => {FooKey(@i=3, @foo=#<Foo:0x7f9775cfae70 @x=1>) => "c", FooKey(@i=3, @foo=#<Foo:0x7f9775cfae60 @x=1>) => "b"}

The main drawback is that you might need a different record type for each possible compound key that you want to use. Also FooKey cannot be destructured by multiple assignment (each { |(i, foo), value| }).

Is this indeed the best practice? Do you think anything can be done to improve this situation?

straight-shoota · May 25, 2023, 3:30pm

This seems like a constructred use case. I imagine it would be pretty rare in real world applications to need such complex keys.
And if you do, I really can’t think of any more practical solution than creating an explicit key type to give it the intended semantics.
Perhaps an alternative would be allowing to hook into the Hash implementation, but I’m not sure that would be a significant improvement (and certainly more complex).

bcardiff · May 25, 2023, 4:58pm

I think it would be better to fail on runtime rather than fallback. The user of the hash will benefit of an early error rather than going into the hash to understand why something is misbehaving.

Topic		Replies	Views
Getting a key from a Hash Help & Support	5	288	July 6, 2022
Question about hash to set class arguments Help & Support	13	324	February 9, 2022
RFC: Strict hash keys Community	14	679	September 22, 2020
[Suggestion] Support union types inside nested Hashes Crystal Contrib	5	411	October 12, 2019
Ongoing experiment: ident pool Community	13	566	July 21, 2022

Compound hash keys with `compare_by_identity` members

Related topics