Trying to figure out if this is possible. It doesn’t look like it is, but maybe someone knows of a way
Have you checked JSON and YAML modules? Isn’t it what you are looking for?
https://crystal-lang.org/api/0.27.2/JSON.html
https://crystal-lang.org/api/0.27.2/YAML.html
Hey @avitkauskas! Those would work but require you to manually set to how serialize/deserialize. I was hoping there was a way to do it automatically to YAML/JSON/some binary format.
Maybe that just doesn’t exist though
I proposed this a while ago: https://github.com/crystal-lang/crystal/issues/6309
(there’s no generic binary format like Marshal, but it could be done with this )
But the community and core team rejected it.
I still think any object should be automatically serializable to any format, like in Java and C# (and you can control this serialization with attributes).
I’ll take a look at Cannon. It seems Procs and a few other things are missing, but Procs are super hard to do anyway. It might work out!
How do you serialize procs?
Is also https://github.com/Blacksmoke16/CrSerializer. Is based on the *::Serializable
stuff but with some extra features.
There’s something I’d like to see in Crystal (actually I started it, but with all the other things it’s going quite slow…) it’s a serialization mechanism similar to https://serde.rs where you tell how your object is to be serialized but not with which serialization mechanism, and you can use anything like json, yaml, msgpack,… easily…
And like you say @asterite you can configure some things using attributes, like field ignore, additional fields, conversions, custom serialization (still mechanism agnostic)
Yes, that’s something I’d like to see as well. It’s so much hassle when you have type that should be serializable to different formats.
So in this generic serialization format… how do you say something should be an XML attribute or an XML element?
In this case there would be some specific flags for XML (de-)serializer I guess…
https://github.com/RReverser/serde-xml-rs doesn’t seem to have that, maybe the (de-)serializer is smart for some things? (no time to check right now)
As far as I understand it, serde-xml-rs
uses both attributes and child elements to deserialize an object. Serialization seems to not create any attributes by default. See their test suite.
I have experimented a bit, and got this:
require "json"
module JSON
def self.serialize(data, &block)
pull = PullParser.new(data)
pull.read_begin_object
while pull.kind != :end_object
key = pull.read_object_key
yield key, pull
end
end
def self.deserialize(**members)
String.build do |str|
JSON.deserialize str, **members
end
end
def self.deserialize(io : IO, **members)
JSON.build(io) do |json|
members.build json
end
end
end
class String
def self.from(pull : JSON::PullParser)
pull.read_string
end
def build(builder : JSON::Builder)
to_json builder
end
end
struct Int32
def self.from(pull : JSON::PullParser)
v = pull.int_value.to_i32
pull.read_next
v
end
def build(builder : JSON::Builder)
to_json builder
end
end
struct NamedTuple
def build(builder : JSON::Builder)
to_json builder
end
end
class Object
def from(type, data)
{% for ivar in @type.instance_vars %}
_{{ivar.id}} = nil
{% end %}
{% begin %}
# Standard JSON/YAML etc iterator
type.serialize(data) do |key, pull|
case key
{% for ivar in @type.instance_vars %}
when {{ivar.stringify}} then _{{ivar.id}} = {{ivar.type.id}}.from(pull)
{% end %}
else raise "unknown key: #{key}"
end
end
{{@type.id}}.new(
{% for ivar in @type.instance_vars %}\
{{ivar.id}}: _{{ivar.id}}.as({{ivar.type}}),
{% end %}\
)
{% end %}
end
macro method_missing(build)
def build(type)
type.deserialize(
{% for ivar in @type.instance_vars %}\
{{ivar}}: @{{ivar}},
{% end %}
)
end
end
end
record Point, x : Int32, y : String
data = %({"x": 1, "y": "abc"})
point = Point.from(JSON, data)
puts point #=> Point(@x=1, @y="abc")
puts point.build(JSON) #=> {"x":1,"y":"abc"}
The implementation is imperfect, it only exists to show that’s possible. We can then implement custom generic annotations, inspired by Serializable
.
If we implement a new way to map JSON/YAML, we have to think how to phase out .mapping
and Serializable
. It won’t be reasonable to have 3 ways to do the same thing in the stdlib.
I been thinking about this a bit as I came up with a better way to implement CrSerializer that would be more flexible. However I think it’s important for us to define what serialization in Crystal should look like, and the goals we wish to achieve. Then from there we have some criteria to evaluate various implementations.
Some of my thoughts/ideas.
Annotations
Annotations should be used to control how properties get serialized/deserialized in a (mostly) format agnostic way. The current usage of using JSON::Field
and YAML::Field
is not scalable if every format needs its own annotation.
However, if each format has its own; it would allow you to control how type is serialized on a per format basis. I kinda doubt this is common/useful enough to worry about?
Pluggable
Supporting a new format shouldn’t require anything more than defining how each type gets serialized. The “framework” around the serialization logic should be kept separate from the actual implementation.
There will be exceptions to this, for example XML: annotating a property with @[XmlAttribute]
would be specific to how that property is serialized in XML
Flexible
The current implementations make it hard to add new features/customization beyond converters. The ideal framework would allow greater control over how, when, and if a property should be serialized.
Having some extra control/flexibility would be great. A few examples would be:
- Serialize based on a version since/until x
- Changing the view based on groups a property is in
- Able to consume a property on deserialization but skip it on serialization
- Something custom the user wants to implement
- E.x. If a property should be serialized
API
I’m thinking a good api would be like
Serializer.serialize(data : _, format : Format, context : Context? = nil) : R
Where:
- data - The obj/type you want to serialize
- format - An Enum value representing the supported types
- context - A class that could be used to pass generic data to the framework
- Like which groups to use, or what version, etc
- R - The return type, String, Bytes etc.
Then, we probably could retrain the to_json
method but have it internally call serialize
while passing an optional context
object.
Final Thoughts
While this by no means represents how the actual implementation will be like, I think its a conversation we should start having sooner than later.
I think its also important to understand we don’t have to do all of this in macro land. Using macros/annotations to provide “metadata” objects for each property that can then be processed into the final output at runtime is much easier and gives way more flexibility.
Annotations
Yes, there should be a generic annotation. I suppose we won’t come around to need some format-specific options as well. But annotation arguments are flexible, so you could just put format-specific options into the generic annotation.
@[Serializer::Field(json_bignum_to_string: true)]
# vs.
@[Serializer::Field]
@[JSON::Field(bignum_to_string: true)]
It might get a bit convoluted though when there are a lot of specifics. But it avoids having duplicate annotation types and questions like if JSON::Field
is present, do you still need Serializer::Field
as well?
Flexibility & API
These examples look like they’re only specific to an individual serializer implementation. So it could just be kept to that.
The important feature of a serialization framework is to standardize data types and mappings. IMHO it doesn not need a generic API to dispatch different serializers.
We don’t have to care how an individual serializer is invoked, just provide a basis for it to work on. That’s much more flexible than trying to fit everything in a unified API call, especially for providing custom options.
JSON.serialize(data, **options)
JSON.deserialize(string, **options)
Serde doesn’t have a unified API either. You just call serde_json::to_string
and serde_json::from_str
.
I think i’d rather there be separate annotations for each “option”. E.x.
@[Serializer::SerializedName("some_key")]
@[Serialize::Expose]
property some_prop : String
vs
@[Serializer::Field(serialized_name: "some_key", expose: true)]
property some_prop : String
IMO this makes it easier to read as, while there would be more annotations, the intent is more clear on each. It also would be more flexible around adding user defined functionality. I.e. a user could define their own annotation to use which wouldn’t conflict with other keys, and allows greater control over what keys are valid.
A pattern I got into recently was having a class/struct that maps to an annotation, which you could then do like (which all gets built at compile time in a macro) MyCustomAnnotation.new {{ivar.annotation(MyCustom).named_args.double_splat}}
. You would inherently be able to control the valid fields allowed on the annotation and their types.
Fair enough, was mainly thinking around how to share the generic portion of each format, but that could easily be done via another method that is used in each format’s implementation. Possibly via a module that can be included into each format’s module. Would also allow that format to modify context if needed before serializing.
I been working on some refactoring to CrSerializer. I’d be happy to hear some thoughts on its implementation when I get it to a good enough place to share.
I’m pretty sure I’d rather choose the latter style. Declaring serialisation options is a single feature, we shouldn’t have to use a bunch of different annotations for this.
The current serializer is pretty simple feature wise, with only 5 options you can edit. The biggest downside of having everything in one annotation is it can quickly become unwieldy, especially if additional features are added like groups, versioning, etc.
Either way I don’t think it would change the implementation that much. The question now is working on, how to go about implementing something, the API we want it to have, and how it will be used/fit into the current ecosystem.
FWIW I refactored my serialization shard to take a more generic approach. Is some work left to do on the deserialization side of things, but I’m quite happy with how it came out.
https://blacksmoke16.github.io/CrSerializer/CrSerializer.html