Possible to serializer/deserialize and object like Ruby marshaling?

I would like to relaunch the topic with a new approach I found to implement out-of-the-box serialization/deserialization.

Compared to my previous approach and the stdlib:

  • no monkey-patching for serialization, maybe for deserialization
  • out of the box serialization/deserialization for any type.
require "json"

module Crystalizer::JSON
  extend self

  def serialize(object)
    String.build do |str|
      serialize str, object
    end
  end

  def serialize(io : IO, object : O) forall O
    ::JSON.build(io) do |builder|
      serialize builder, object
    end
  end

  def serialize(builder : ::JSON::Builder, object : Int32)
    object.to_json builder
  end

  def serialize(builder : ::JSON::Builder, object : String)
    object.to_json builder
  end

  def serialize(builder : ::JSON::Builder, object : O) forall O
    builder.object do
      {% for ivar in O.instance_vars %}
        builder.field {{ivar.stringify}} do
          serialize builder, object.@{{ivar}}
        end
      {% end %}
    end
  end

  def deserialize(data : String | IO, *, to type : O.class) forall O
    deserialize ::JSON::PullParser.new(data), type
  end
  
  def deserialize(pull : ::JSON::PullParser, type : O.class) forall O
    {% begin %}
    {% properties = {} of Nil => Nil %}
    {% for ivar in O.instance_vars %}
      {% ann = ivar.annotation(::Serialization) %}
      {% unless ann && ann[:ignore] %}
        {%
          properties[ivar.id] = {
            type:        ivar.type,
            key:         ((ann && ann[:key]) || ivar).id.stringify,
            has_default: ivar.has_default_value?,
            default:     ivar.default_value,
            nilable:     ivar.type.nilable?,
            root:        ann && ann[:root],
            converter:   ann && ann[:converter],
            presence:    ann && ann[:presence],
          }
        %}
      {% end %}
    {% end %}

    {% for name, value in properties %}
      %var{name} = nil
      %found{name} = false
    {% end %}

    pull.read_begin_object
    while !pull.kind.end_object?
      key = pull.read_object_key
      case key
      {% for name, value in properties %}
      when {{value[:key]}}
        raise "duplicated key: #{key}" if %found{name}
        %found{name} = true
        %var{name} = deserialize pull, {{value[:type]}}
      {% end %}
      else raise "#{key} not found"
      end
    end
    
    O.new(
    {% for name, value in properties %}
      {{name}}: %var{name}.as({{value[:type]}}),
    {% end %}
    )
    {% end %}
  end

  def deserialize(pull : ::JSON::PullParser, type : String.class)
    pull.read_string
  end

  def deserialize(pull : ::JSON::PullParser, type : Int32.class)
    v = pull.int_value.to_i32
    pull.read_next
    v
  end
end


annotation Serialization
end

struct Point
  getter x : Int32
  @[Serialization(key: "YYY")]
  getter y : String

  def initialize(@x, @y)
  end
end


struct MainPoint
  getter p : Point

  def initialize(@p)
  end
end


data = %({"p": {"x": 1, "YYY": "abc"}})

point = Crystalizer::JSON.deserialize data, to: MainPoint
puts point # => MainPoint(@p=Point(@x=1, @y="abc"))
#{Crystalizer::YAML,
{Crystalizer::JSON}.each do |type|
  puts type.serialize point

  # ---
  # p:
  #   x: 1
  #   y: abc
  # {"p":{"x":1,"y":"abc"}}
end

This POC is of course the very base; it is not correct as is.

I would like to have others opinions on this. It be a shard at first.

One point I have not a golden solution: how to create an instance from an object T for deserialization?
T#new won’t necessarily take all ivars, and defining any custom method will required to monkey-patch either T or Object.

Another point, annotations: we should be able to tell how to serialize/deserialize without monkey-patching it with annotations. I can be done with named arguments, or an object passed as argument defining how to (de)serialize.

Maybe the following won’t work for all situations, but this works fairly well for https://github.com/drhuffman12/ai4cr/blob/master/src/ai4cr/neural_network/backpropagation.cr#L93 [better than the marshalling methods that the Ruby version used]:

require "json"
class Foo
  include ::JSON::Serializable
  def initialize(@bar : ...)
  ...
end
...

# then save `Foo.new(...).to_json` to a file or db txt field

# and load via Foo.from_json(previously_saved_json)

However, it doesn’t seem to work so well if Foo has an instance [or class] variable that is defined as a union of types or as a parent type and you try assigning it values of child types.

Is there [community interest to create] a common list [specs?] for what should be serializable that these ‘Better Serializable’ libs could be run against?

The problem with this approach, IMO, is things are still tightly coupled to JSON. If you wanted to support YAML, you would essentially have to duplicate the entire module. This is fine for things that are specific to JSON, like the (de)serialization logic, but isn’t ideal for things that aren’t going to change in between formats, like the main deserialize method that handles the name of the key to use for example.

I’m also not sure blindly serializing all properties of a type is the best idea. I think it would be better take a more explicit approach, only serializing things that opt into it. Take this for example:

# Imagine its the base for some ORM
# Inside it could have common internal properties
class Base
  # Such as an array of errors
  getter errors : Array(Error) = [] of Error
end

# Now you extend this type to create a model class
class User < Base
  getter id : Int32
  getter name : String
end

# You now go to serialize a user object,
# but notice it includes things from the base class
User.new.to_json # => {"id": 1, "name": "Jim", "errors":[]}

If a goal if this POC is for

How would you go about excluding the errors without moneky-patching Base, or redefining @errors in every sublcass with an annotation or something?

Not to steal the show, but I’ve also been working on a new serialization shard. An evolution of Possible to serializer/deserialize and object like Ruby marshaling? - #20 by Blacksmoke16 you could say. Within it, I made an effort to abstract the data from the format that it should be (de)serialized into/from.

This is achieved by defining some methods when you include the module that returns an array of metadata objects that include information about each property; such as internal/external names, type, owning classes, etc. Annotations/macros are used to filter out unwanted properties at compile time.

The benefit of this is that that object itself doesn’t need to know about what format it will be (de)serialized into/from. The format implementations just need to handle working with a common interface. New formats can be added without altering any model code, or using format specific annotations.

The way I handled this is similar to what I did with the metadata objects. I created an abstraction around the data. Currently this is currently just built on top of the current JSON/YAML parsing logic and is essentially just JSON::Any | YAML::Any. In the future a more robust abstraction could be implemented if needed, but this is sufficient for now.

Since there is now a singluar interface for working with the data, I was able to have my module define a single initializer, that takes a type used for deserialization, the metadata objects related to this type, and the data.

def initialize(navigator : ASR::Navigators::DeserializationNavigatorInterface, properties : Array(ASR::PropertyMetadataBase), data : ASR::Any)
  ...
end

Essentially similar to what JSON::Serializable does, but in a more generic way.

Not sure I follow what your point is there. I would say that’s a valid way to store the state of an object for use at a later point yes.

As I said, there will be annotations/arguments to tell which ivars have to be serialized.

The custom initialize will need to be monkey-patched to the object we want to deserialize - as what I suspected, can’t work out-of-the-box for any object :confused:

For the deserialize logic, it can be put on a common base module, that will be included in JSON, YAML, etc.

That’s a good idea, I am thinking also to separate actual objects with how (de)serializing them. This is even more important because all formats are not the same, and can require different properties.

I found a way to instantiate objects without monkey patching them (of course, that’s unsafe):

class Point
  getter x : Int32
  getter y : String

  def initialize(@x, @y)
  end
end

instance = Point.allocate

GC.add_finalizer(instance) if instance.responds_to?(:finalize)
pointerof(instance.@x).value = 1
pointerof(instance.@y).value = "abc"

p! instance

Sadly this assumes there is an initializer argument for each ivar. I don’t think there is a way around needing to add some custom initializer for the deserialization process :/.

I released Crystalizer, which brings out of the the box [de]serialization for any type from/to YAML/JSON.
A common core is used, no monkey patching, shared annotations, and virtually anyone can use the library interface to add support for new formats.

6 Likes