Co-allocated reference objects

ReferenceStorage(T) exposes the instance representation of a reference type T. The original use case of this type and T.unsafe_construct is to materialize instances of T from memory locations other than the GC heap, such as on the call stack or embedded inside another reference type. However, a different kind of use is also possible where multiple T instances are created side by side from a single allocation. Let’s say we want a fixed-size collection of Foos:

class Foo
  def initialize(@x : Int32, @y : String)
  end
end

foos = Slice.new(10) { |i| Foo.new(i, i.to_s) }

foos[0] # => #<Foo:0x103304e20 @x=0, @y="0">
foos[3] # => #<Foo:0x103304dc0 @x=3, @y="3">
foos[7] # => #<Foo:0x103304d40 @x=7, @y="7">

Behind the scenes, this will call GC.malloc(instance_sizeof(Foo)) 10 times. Because Foo does not require variable-size allocation like String or Log::Metadata, we could easily group all the allocations into one call:

foos = Pointer(ReferenceStorage(Foo)).malloc(10)
10.times { |i| Foo.unsafe_construct(foos + i, i, i.to_s) }

foos.as(Foo)       # => #<Foo:0x10094af20 @x=0, @y="0">
(foos + 3).as(Foo) # => #<Foo:0x10094af50 @x=3, @y="3">
(foos + 7).as(Foo) # => #<Foo:0x10094af90 @x=7, @y="7">

Notice how the gaps between those Foos’ addresses are now smaller, since those instances are indeed stored contiguously:

puts foos.as(UInt8*).to_slice(instance_sizeof(Foo) * 10).hexdump
00000000  b6 00 00 00 00 00 00 00  b8 55 fe 02 01 00 00 00  .........U......
00000010  b6 00 00 00 01 00 00 00  c8 55 fe 02 01 00 00 00  .........U......
00000020  b6 00 00 00 02 00 00 00  a0 6f 30 03 01 00 00 00  .........o0.....
00000030  b6 00 00 00 03 00 00 00  90 6f 30 03 01 00 00 00  .........o0.....
00000040  b6 00 00 00 04 00 00 00  80 6f 30 03 01 00 00 00  .........o0.....
00000050  b6 00 00 00 05 00 00 00  70 6f 30 03 01 00 00 00  ........po0.....
00000060  b6 00 00 00 06 00 00 00  60 6f 30 03 01 00 00 00  ........`o0.....
00000070  b6 00 00 00 07 00 00 00  50 6f 30 03 01 00 00 00  ........Po0.....
00000080  b6 00 00 00 08 00 00 00  40 6f 30 03 01 00 00 00  ........@o0.....
00000090  b6 00 00 00 09 00 00 00  30 6f 30 03 01 00 00 00  ........0o0.....

On each row, b6 00 00 00 is Foo.crystal_instance_type_id, the next 4 bytes are @x, and finally the remaining 8 bytes are @y.object_id; there is no extra bookkeeping data for the allocator between the instances. The foos raw pointer and the unsafe_construct call could be further hidden from the public using a helper type:

struct ReferenceSlice(T)
  include Indexable(T)

  @buf : ReferenceStorage(T)*

  getter size : Int32

  def initialize(@size : Int32, & : Int32 -> {Tuple, NamedTuple})
    @buf = Pointer(ReferenceStorage(T)).malloc(size)
    size.times do |i|
      args, opts = yield i
      T.unsafe_construct(@buf + i, *args, **opts)
    end
  end

  def unsafe_fetch(index : Int) : T
    (@buf + index).as(T)
  end
end

foos = ReferenceSlice(Foo).new(10) do |i|
  args = {i, i.to_s}
  {args, NamedTuple.new}
end

foos[0]  # => #<Foo:0x10262cf20 @x=0, @y="0">
foos[3]  # => #<Foo:0x10262cf50 @x=3, @y="3">
foos[-2] # => #<Foo:0x10262cfa0 @x=8, @y="8">

Indexing a ReferenceSlice(T) becomes simple pointer arithmetic, and the objects have good spatial locality. In contrast, an Slice(T) always requires dereferencing its buffer pointer; both the buffer itself and the T instances could also be very far apart, depending on what the GC is doing.

The main drawback is that all the objects in a ReferenceSlice must share the same lifetime, since there is only one allocation after all. A live reference to any T is implicitly a live reference to all other Ts in that slice, thus extracting just a handful objects and discarding the rest is a bad idea. Additionally, the GC is not primarily designed for large allocations on the order of several MBs, so slice size can be a concern for fat classes (there are plenty of them among the Crystal AST nodes). Finally, this approach does not readily generalize to a resizable array of references, because calling #realloc on @buf invalidates all the objects immediately; their object_ids would change in-place, and all the old Ts would point to freed memory.

I wonder if there are any practical applications for such a ReferenceSlice type?

2 Likes

These look like fantastic additions to the toolbox of available things. I’m sure applications will show up over time for both variants.

That said I’m not certain I like the names of either - it is not a storage of references, it is a joint storage of instances, or sets of instances where the backing memory is owned by the collection. And considering references are often stored in slices, that name is no better. I’d suggest something like OwningStorage, OwnerStorage or InstanceStorage or something like that.

But that aside, they would both be really nice additions and I really like the deconstruction of the existing initialization into accessible methods all the way down.

This sounds like Buffer or Vector to me. In the past I did some refactors to split Array’s buffer into this kind of structure. But it was never merged and other optimization make it harder.

Having this kind of structure make sense to me. But we can start as a shard and then merge it so we can iterate faster.

2 Likes

This sounds like an awesome way write a specialized database. The concept of a ReferenceSlice would be similar to a page of tuples in Postgres.

It like this ReferenceSlice(T). It would be a nice foundation to write specialized types.

this approach does not readily generalize to a resizable array of references

It can still be dynamic by allocating blocks instead of reallocating a contiguous buffer: it can grow and even shrink and the pointers wouldn’t change… until you start to reorder or move things inside the collection.

def [](index : Int32) : T
  raise IndexError.new unless index < @size

  i, j = index.divmod(entries_per_block)
  @blocks.unsafe_at[i].unsafe_fetch(j)
end
1 Like

A minor spin on this theme is to allocate individual objects on demand, but still share the same buffer. That would of course be a specialized arena allocator:

class ReferenceArena(T, N)
  include Indexable(T)

  getter size : Int32 = 0

  @buf = uninitialized ReferenceStorage(T)[N]

  def new(*args, **opts) : T
    raise RuntimeError.new "..." if @size >= N
    obj = T.unsafe_construct(@buf.to_unsafe + @size, *args, **opts)
    @size &+= 1
    obj
  end

  def unsafe_fetch(index : Int) : T
    (@buf.to_unsafe + index).as(T)
  end
end

From here you could turn N into a @capacity instance variable, add multiple “buckets”, etc.

One common question is where T#finalize should be called; because we used unsafe_construct in ReferenceSlice and ReferenceArena, this is now entirely up to the user.

3 Likes