[RFC] Lazy Objects

Crystal currently provides a few ways of dealing with lazily evaluating code. The most common being the block getter macro. It’s also pretty easy to use captured blocks to delay object instantiation until first accessed, especially via forward_missing_to. E.g.

struct LazyObject(T)
  forward_missing_to self.instance

  # :nodoc:
  delegate :==, :===, :=~, :hash, :tap, :not_nil!, :dup, :clone, :try, to: self.instance

  getter instance : T { @instantiated = true; @loader.call }
  getter? instantiated : Bool = false

  def initialize(@loader : Proc(T)); end
end

The main challenge/annoyance with this is the underlying code has to be aware of the laziness. E.g. you have to type everything as LazyObject(Article) vs just as Article, even if the former it totally compatible with the latter (of which [RFC] Standardized Abstractions is a similar use case).

However I recently came across this new PHP feature for Lazy Objects and one part really jumped out at me.

[lazy ghosts] are indistinguishable from an object that was never lazy

What this ultimately means is if you have a function like:

function test(Article $obj)

You can freely pass both a lazy ghost Article or an actual Article instance and it treats them both the same. I’m not sure how/if something like this would be implemented in Crystal land, but would be a very useful thing to have. A solid use case being for ORMs:

# pseudo code
class User
  getter! id : Int64
  getter name : String
end

class Article
  getter! id : Int64
  getter author : User
end

article = Article.find 123

# `Article.author` could be a `User` ghost that is aware of its PK
article.author.id # => 456

# Accessing other fields lazily initializes the ghost which actually performs the query to hydrate the rest of it
article.author.name # => Foo

My current, somewhat naive thinking, is that if we had a say Ghost(T) type in the stdlib, that the compiler could just know Ghost(Article) is compatible with Article. Which just isn’t something you can do solely in user code.

But curious to hear other’s thoughts!

So I don’t really see what you gain by doing this rather than just instantiating a real user instead of a ghost when you access the field.

However, the ORM example is also a prime example of why it is problematic - doing automagic calls to a database is a very efficient way to get dogshit performance as you very easily end up with n+1 queries where you do one query for each record in a collection instead of one query for them all. If it were possible to turn off this magic autoloading from Activerecord in Rails I’d be happy, as it would surface many, many performance issues.

4 Likes

The benefit is you’re able to only query for what you need and not be required to load all relationships of a given entity. E.g. if you query for all articles but only need the author’s ID it would be a waste to have extra queries eagerly fetching data you’re not going to use. The ORM I’m used to working with uses the ghost/proxy objects as a way to handle that in a pretty performant way. As the default fetch type is LAZY.

Article ID 1 - Author: Fred (1)
Article ID 2 - Author: Fred (1)
Article ID 3 - Author: Jim (2)
Article ID 4 - Author: Fred (1)

Like this example would result in 3 queries:

  1. Fetch all the articles
  2. Fetch author 1
  3. Fetch author 2

So not as bad as actual N+1, but still valid point. If you know you’ll be using all this data, you can set the fetch mode of the Article.author relationship to EAGER in which case it can then output the same data with just a single query. I found it nice to work with as you have the flexibility to not have to worry about querying selecting too much, but also have the option to improve performance by telling it to fetch eagerly; either all he time or on a specific query.

Another use case for the ghost is you can have partial objects. Like say your Article entity has the contents of the article as a TEXT field. You may not want to load that in all the time to save memory. So could leverage a ghost that represents the article, just excluding that one property/having it be fetched only when accessed.


However, after sleeping on it I realized that maybe the focus of this RFC shouldn’t really be on lazy objects themselves, but instead related to this idea of having objects that are interchangeable even if they don’t implement an explicit interface.

Ultimately this relates to the “implicit interface” idea back in [RFC] Standardized Abstractions - #3 by jhass. Possibly could have special handling for delegate/forward_missing_to (or maybe some new construct) to let the compiler know an object is compatible?

Another open question is how to handle the partial object side of thing. That could probably just be handled in user land via getter! macro for example easily enough tho.

It sounds to me that you are vouching for some lazy semantics in the language.

For example in Haskell a value of type T is either a T or a thunk that evaluates to a T (if it finish). This is a core piece of lazy languages. Usually you don’t distinguish between a fully evaluated or a non evaluted thunk. Yo can force evaluation, but that’s it. T and thunk of T are pretty much indistinguishable.

Maybe an alias Lazy(T) = T | Func(T) could be a starting point to allow this flexibility in specific places and opt-in lazy. You sill still need explicit call to resolve the lazy. But you actially need some mutation so once you evaluate the Func you don’t do it again. The alias will not give you that.

Yea, that’s very similar to how PHP handle’s their lazy objects as well. Seems to be a pretty powerful feature that has use cases beyond an ORM too. Like is it something that would tie into the promise talk from [RFC] Structured Concurrency · Issue #6468 · crystal-lang/crystal · GitHub? :person_shrugging:.

The alias idea would require Request for generic alias · Issue #2803 · crystal-lang/crystal · GitHub. I’d want to think more about how this would play out. Continuing with the ORM example, if there was a clean way to use the alias on the entity, but then have everything else just be T that could be quite nice. Otherwise if you’d have to use it on everything, would be a bit annoying to deal with.

Would you tho? If you had alias Lazy(T) = T | LazyObject(T) like what was used in the OP could just leverage forward_missing_to and delegate to resolve the value once and then just call it. This works more similarly to PHP’s proxy object than ghosts since you still have the “wrapper” type versus the resulting object just being T after resolving it? Assuming that’s what you mean by “mutation”?

EDIT: NVM, I think you more so meant mutation removing Func(T) from the type once resolved.

Another thing I touched on in my last reply is the partial object idea. In PHP’s implementation it’s possible do do like $reflector->getProperty('id')->setRawValueWithoutLazyInitialization($author, 123); such that if you were to do like article.author.id it wouldn’t load the whole object since that property is already defined. With the alias it’s all or nothing and you dont get this property specific laziness, only on the object.

So what exactly is stopping you from implementing the behavior you want in the ORM without having to resort to changing the type system? The objects that the ORM expose could be wrappers for fully or partially loaded records, and the actual handling of loading on demand kinda needs to happen anyhow. So I don’t really see what the issue is.

I haven’t fully explored how’d I’d handle the ORM relationships in the current state of things. But my initial thinking is that native support for this in the compiler would provide a more seamless/better experience than what can currently be done in user land.

Like at the moment I guess you could have a dedicated ORM type like property user : Proxy(User) to act as your proxy. This should overall work okay, but is of course less seamless if you have to do article.user.instance or whatever to actually get a User such that you dont have to explicitly type everything as Proxy(User) | User. The alias idea @bcardiff suggested would help with this, but still not quite perfect.

This idea of partial object also complicates things. Like how to actually implement that in a good way without compiler support? Maybe like store the partial values in the wrapper somehow, while using method_missing to generate getters for them? But would need to store/return them in such a way where the return type doesnt just become a big union of all the ivars on the obj. Or maybe a lower level way where the wrapper type news up T in an unsafe way with uninitialized ivars and just lazy load them as needed. Might play around with this…

EDIT: Or leverage the ! getter/property macros. Might be sufficient :thinking:.
EDIT2: Or maybe this is a good usecase for that .pre_initialize experimental feature

But yea, I just thought PHP’s feature sounded pretty neat so wanted to through the idea out there. As there are of course other use cases beyond ORMs.

I came up with this:

class User
  getter id : Int64?
  property name : String
  property active : Bool = true

  def initialize(@name : String)
    pp "New #{self.class}"
  end
end

# A proxy represents an object of type `T` that may be lazily, and/or partially, initialized.
# It accepts a block that is captured and is expected to return a new non-lazy instance of `T`.
# The may use the lazy object when creating the real instance, but must not return itself.
#
# The original lazy instance will only have properties with default values set initially.
# All method calls on the proxy are forwarded to the real underlying instance, triggering initialization if it was not already.
#
# Properties set via `#set_value_lazily` will initialize a value without causin initialization.
struct Proxy(T)
  private abstract struct PropertyMetadataBase; end

  private struct PropertyMetadata(ObjectType, PropertyType, PropertyIdx) < PropertyMetadataBase
    def set_value(object : ObjectType, value : PropertyType) : Nil
      {% begin %}
        pointerof(object.@{{ObjectType.instance_vars[PropertyIdx].name.id}}).value = value
      {% end %}
    end

    def set_value(object : ObjectType, value : _) : NoReturn
      raise "BUG: Invoked wrong set_raw_value overload."
    end
  end

  # Keeps track of the properties that were lazily set via `#set_value_lazily`.
  @initialized_properties = Set(String).new

  # Hash representing all the ivars of `T` to support setting values lazily easier.
  @properties = Hash(String, PropertyMetadataBase).new

  # The internal instance of `T`.
  # Is assumed to be in a partial state unless `@initialized` is `true`.
  @instance : T

  getter? instantiated : Bool = false

  # There's prob a more robust way to handle this,
  # but we need to be smart and forward only un-initialized properties to `self.instance`.
  # Otherwise if we know a value was initialized then we can just use the partial `@instance`.
  #
  # For now we'll just assume all non-setters are fair game to check against `@initialized_properties`.
  macro method_missing(call)
  {% begin %}
    {% unless call.name.ends_with?('=') %}
      def {{call.name.id}}
        if @initialized_properties.includes?({{call.name.stringify}})
          return @instance.{{call}}
        end

        self.instance.{{call}}
      end
    {% else %}
      self.instance.{{call}}
    {% end %}
  {% end %}
  end

  def initialize(&@loader : T -> T)
    @instance = T.allocate

    {% for ivar, idx in T.instance_vars %}
      @properties[{{ivar.name.id.stringify}}] = PropertyMetadata(T, {{ivar.type}}, {{idx}}).new
    {% end %}
  end

  # Fully initializes `@instance` if it is not already by calling the `@loader`
  # `@loader` will only be called once.
  def instance : T
    return @instance if @instantiated

    @instance = @loader.call @instance
  ensure
    @instantiated = true
  end

  def to_s(io : IO) : Nil
    io << "proxy " << T
  end

  def inspect(io : IO) : Nil
    io << "proxy " << T
  end

  def set_value_lazily(name : String, value : V) : Nil forall V
    @properties[name].set_value @instance, value
  ensure
    @initialized_properties << name
  end
end

proxy = Proxy(User).new do |partial|
  # Assume user is fetched via DB/ORM.
  # Assumes the PK is loaded in the partial object.
  # user = User.find partial.id
  user = User.new "Fred"
end

# Sets a specific property on the partial object without triggering initialization.
# In an ORM context, the ORM would do this for the primary key identifiers on the proxy object.
# Somewhat breaks down due to it having to return a new instance tho, prob just have to leverage the `PropertyMetadata` context again for that maybe? :shrug:
#
# Can be somewhat annoying since there is no autocasting when doing `Pointer(Int64).value = 123_i32` :/.
proxy.set_value_lazily "id", 2_i64

# Does not trigger initialization
pp proxy.id

# Triggers initialization
pp proxy.name

# Triggers initialization
pp proxy.instance

# Triggers compile time errors still:
pp proxy.blah

# Triggers runtime error, but probably okay assuming this feature
# would be somewhat internal to the ORM.
proxy.set_value_lazily "pi", 3.14

Seems to work faily well as somewhat of a hybrid between PHP’s Ghost and Proxy. The main points of discussion being:

  • In PHP, once a ghost is initialized, it just turns into @instance, which would def require compiler support. No idea how you’d even do this tho.
lazy ghost object(Example)#3 (0) {
  ["prop"]=>
  uninitialized(int)
}
// Obj is initialized and is no diff than if it was always just `new Example(1)`
object(Example)#3 (1) {
  ["prop"]=>
  int(1)
}
  • It’s not really indistinguishable with User (would fail .is_a? checks and such)
  • It would be nice if there was a way to just have the block be able to do like partial.initialize(...) and retain the existing data versus having to return a new instance. I was able to get this working via a custom method since #initialize is protected, but maybe there’s room for improvement there
    • Related, does using .allocate make sense here? From what I can tell yes because i want memory allocated/managed by the GC, but just don’t call #initialize
  • Kinda had to go a bit low-level/hacky to setting partial values. Feels like it should work okay but not my area of expertise
    • Is there a better way to handle method_missing side of it?
    • Could also prob optimize some of it. Like set some of the ivars to nil to free up memory once @initialized

What about being able to inherit generic arguments?

class Proxy(T) < T
end
1 Like