Possible to serializer/deserialize and object like Ruby marshaling?

paulcsmith · March 18, 2019, 4:50pm

Trying to figure out if this is possible. It doesn’t look like it is, but maybe someone knows of a way

avitkauskas · March 19, 2019, 7:27pm

Have you checked JSON and YAML modules? Isn’t it what you are looking for?
https://crystal-lang.org/api/0.27.2/JSON.html
https://crystal-lang.org/api/0.27.2/YAML.html

paulcsmith · March 19, 2019, 7:38pm

Hey @avitkauskas! Those would work but require you to manually set to how serialize/deserialize. I was hoping there was a way to do it automatically to YAML/JSON/some binary format.

Maybe that just doesn’t exist though

asterite · March 19, 2019, 7:40pm

I proposed this a while ago: https://github.com/crystal-lang/crystal/issues/6309

(there’s no generic binary format like Marshal, but it could be done with this )

But the community and core team rejected it.

I still think any object should be automatically serializable to any format, like in Java and C# (and you can control this serialization with attributes).

paulcsmith · March 19, 2019, 9:34pm

I’ll take a look at Cannon. It seems Procs and a few other things are missing, but Procs are super hard to do anyway. It might work out!

asterite · March 19, 2019, 10:23pm

How do you serialize procs?

Blacksmoke16 · March 20, 2019, 2:01am

Is also https://github.com/Blacksmoke16/CrSerializer. Is based on the *::Serializable stuff but with some extra features.

bew · March 20, 2019, 7:07am

There’s something I’d like to see in Crystal (actually I started it, but with all the other things it’s going quite slow…) it’s a serialization mechanism similar to https://serde.rs where you tell how your object is to be serialized but not with which serialization mechanism, and you can use anything like json, yaml, msgpack,… easily…

And like you say @asterite you can configure some things using attributes, like field ignore, additional fields, conversions, custom serialization (still mechanism agnostic)

straight-shoota · March 20, 2019, 8:51am

Yes, that’s something I’d like to see as well. It’s so much hassle when you have type that should be serializable to different formats.

asterite · March 20, 2019, 9:54am

So in this generic serialization format… how do you say something should be an XML attribute or an XML element?

bew · March 20, 2019, 10:55am

In this case there would be some specific flags for XML (de-)serializer I guess…

https://github.com/RReverser/serde-xml-rs doesn’t seem to have that, maybe the (de-)serializer is smart for some things? (no time to check right now)

straight-shoota · March 20, 2019, 12:15pm

As far as I understand it, serde-xml-rs uses both attributes and child elements to deserialize an object. Serialization seems to not create any attributes by default. See their test suite.

j8r · March 24, 2019, 4:03pm

I have experimented a bit, and got this:

require "json"

module JSON
  def self.serialize(data, &block)
    pull = PullParser.new(data)
    pull.read_begin_object
    while pull.kind != :end_object
      key = pull.read_object_key
      yield key, pull
    end
  end

  def self.deserialize(**members)
    String.build do |str|
      JSON.deserialize str, **members
    end
  end

  def self.deserialize(io : IO, **members)
    JSON.build(io) do |json|
      members.build json
    end
  end
end

class String
  def self.from(pull : JSON::PullParser)
    pull.read_string
  end

  def build(builder : JSON::Builder)
    to_json builder
  end
end

struct Int32
  def self.from(pull : JSON::PullParser)
    v = pull.int_value.to_i32
    pull.read_next
    v
  end

  def build(builder : JSON::Builder)
    to_json builder
  end
end

struct NamedTuple
  def build(builder : JSON::Builder)
    to_json builder
  end
end

class Object
  def from(type, data)
    {% for ivar in @type.instance_vars %}
    _{{ivar.id}} = nil
    {% end %}

    {% begin %}
    # Standard JSON/YAML etc iterator
    type.serialize(data) do |key, pull|
    case key
    {% for ivar in @type.instance_vars %}
    when {{ivar.stringify}} then _{{ivar.id}} = {{ivar.type.id}}.from(pull)
    {% end %}
    else raise "unknown key: #{key}"
    end
    end

    {{@type.id}}.new(
      {% for ivar in @type.instance_vars %}\
        {{ivar.id}}: _{{ivar.id}}.as({{ivar.type}}),
      {% end %}\
    )
    {% end %}
  end
  
  macro method_missing(build)
  def build(type)
    type.deserialize(
      {% for ivar in @type.instance_vars %}\
        {{ivar}}: @{{ivar}},
      {% end %}
    )
  end
  end
end

record Point, x : Int32, y : String
data = %({"x": 1, "y": "abc"})

point = Point.from(JSON, data)
puts point #=> Point(@x=1, @y="abc")

puts point.build(JSON) #=> {"x":1,"y":"abc"}

j8r · March 24, 2019, 4:12pm

The implementation is imperfect, it only exists to show that’s possible. We can then implement custom generic annotations, inspired by Serializable.
If we implement a new way to map JSON/YAML, we have to think how to phase out .mapping and Serializable. It won’t be reasonable to have 3 ways to do the same thing in the stdlib.

Blacksmoke16 · August 15, 2019, 12:02am

I been thinking about this a bit as I came up with a better way to implement CrSerializer that would be more flexible. However I think it’s important for us to define what serialization in Crystal should look like, and the goals we wish to achieve. Then from there we have some criteria to evaluate various implementations.

Some of my thoughts/ideas.

Annotations

Annotations should be used to control how properties get serialized/deserialized in a (mostly) format agnostic way. The current usage of using JSON::Field and YAML::Field is not scalable if every format needs its own annotation.

However, if each format has its own; it would allow you to control how type is serialized on a per format basis. I kinda doubt this is common/useful enough to worry about?

Pluggable

Supporting a new format shouldn’t require anything more than defining how each type gets serialized. The “framework” around the serialization logic should be kept separate from the actual implementation.

There will be exceptions to this, for example XML: annotating a property with @[XmlAttribute] would be specific to how that property is serialized in XML

Flexible

The current implementations make it hard to add new features/customization beyond converters. The ideal framework would allow greater control over how, when, and if a property should be serialized.

Having some extra control/flexibility would be great. A few examples would be:

Serialize based on a version since/until x
Changing the view based on groups a property is in
Able to consume a property on deserialization but skip it on serialization
Something custom the user wants to implement
- E.x. If a property should be serialized

API

I’m thinking a good api would be like

Serializer.serialize(data : _, format : Format, context : Context? = nil) : R

Where:

data - The obj/type you want to serialize
format - An Enum value representing the supported types
context - A class that could be used to pass generic data to the framework
- Like which groups to use, or what version, etc
R - The return type, String, Bytes etc.

Then, we probably could retrain the to_json method but have it internally call serialize while passing an optional context object.

Final Thoughts

While this by no means represents how the actual implementation will be like, I think its a conversation we should start having sooner than later.

I think its also important to understand we don’t have to do all of this in macro land. Using macros/annotations to provide “metadata” objects for each property that can then be processed into the final output at runtime is much easier and gives way more flexibility.

straight-shoota · August 16, 2019, 8:00am

Annotations

Yes, there should be a generic annotation. I suppose we won’t come around to need some format-specific options as well. But annotation arguments are flexible, so you could just put format-specific options into the generic annotation.

@[Serializer::Field(json_bignum_to_string: true)]
# vs.
@[Serializer::Field]
@[JSON::Field(bignum_to_string: true)]

It might get a bit convoluted though when there are a lot of specifics. But it avoids having duplicate annotation types and questions like if JSON::Field is present, do you still need Serializer::Field as well?

Flexibility & API

These examples look like they’re only specific to an individual serializer implementation. So it could just be kept to that.

The important feature of a serialization framework is to standardize data types and mappings. IMHO it doesn not need a generic API to dispatch different serializers.
We don’t have to care how an individual serializer is invoked, just provide a basis for it to work on. That’s much more flexible than trying to fit everything in a unified API call, especially for providing custom options.

JSON.serialize(data, **options)
JSON.deserialize(string, **options)

Serde doesn’t have a unified API either. You just call serde_json::to_string and serde_json::from_str.

Blacksmoke16 · August 16, 2019, 4:40pm

I think i’d rather there be separate annotations for each “option”. E.x.

@[Serializer::SerializedName("some_key")]
@[Serialize::Expose]
property some_prop : String

vs

@[Serializer::Field(serialized_name: "some_key", expose: true)]
property some_prop : String

IMO this makes it easier to read as, while there would be more annotations, the intent is more clear on each. It also would be more flexible around adding user defined functionality. I.e. a user could define their own annotation to use which wouldn’t conflict with other keys, and allows greater control over what keys are valid.

A pattern I got into recently was having a class/struct that maps to an annotation, which you could then do like (which all gets built at compile time in a macro) MyCustomAnnotation.new {{ivar.annotation(MyCustom).named_args.double_splat}}. You would inherently be able to control the valid fields allowed on the annotation and their types.

Fair enough, was mainly thinking around how to share the generic portion of each format, but that could easily be done via another method that is used in each format’s implementation. Possibly via a module that can be included into each format’s module. Would also allow that format to modify context if needed before serializing.

I been working on some refactoring to CrSerializer. I’d be happy to hear some thoughts on its implementation when I get it to a good enough place to share.

straight-shoota · August 18, 2019, 9:57pm

I’m pretty sure I’d rather choose the latter style. Declaring serialisation options is a single feature, we shouldn’t have to use a bunch of different annotations for this.

Blacksmoke16 · August 20, 2019, 8:24pm

The current serializer is pretty simple feature wise, with only 5 options you can edit. The biggest downside of having everything in one annotation is it can quickly become unwieldy, especially if additional features are added like groups, versioning, etc.

Either way I don’t think it would change the implementation that much. The question now is working on, how to go about implementing something, the API we want it to have, and how it will be used/fit into the current ecosystem.

Blacksmoke16 · October 26, 2019, 11:27pm

FWIW I refactored my serialization shard to take a more generic approach. Is some work left to do on the deserialization side of things, but I’m quite happy with how it came out.

https://blacksmoke16.github.io/CrSerializer/CrSerializer.html