How to model serialization when everything is optional

I’m working on a client for a GraphQL API. One of the hard parts about modeling GraphQL responses in Crystal is that every property of every type is optional based on whether it was explicitly requested in the GraphQL query. So I feel like in order to support this:

  1. define all properties as optional with getter foo : Foo? but then consumers of the types have to do nil checks everywhere
  2. define all properties with getter! foo : Foo, which lets them be nil until they’re accessed, but then type checking largely happens at runtime
  3. define bespoke types per query, but then there’s a potential explosion of types

I feel like, of those options, #2 seems like the least obtrusive approach. Are there other alternatives here?

My 2 cents is you want type safety, just do the thing that gives you that and go with 3. If they notice they’re ending up with a lot of types, maybe they could boil them down into modules to share parts, or try and re-use queries more, or be more aware of what they’re doing vs having a single larger query?

Could have a little record macro that make this easier too.

Have you looked at similar libraries in other statically typed languages, how they’re doing it?

The nilable type with runtime resolution seems like a good approach to me.
I would consider not using getter! though. This macro is intended for ivars which are usually expected to be non-nil, so that #foo works. That’s probably not the case with optional API results. So maybe #foo should better return the nilable type and #foo! the non-nilable.

1 Like

Yeah, I was thinking if I went that route I’d probably do something like different types per route/controller. I like the idea of making a record-like macro for it.

None of the other statically typed languages I’m familiar enough with have a type system like Crystal’s, unfortunately. For example, Go defaults to the type’s zero value and Java lets any reference type be null.

When I say “optional”, I don’t mean it’s optional in the sense that it can be nil when you ask for it, but more about how to handle when you don’t ask for it. For example:

struct UserResult
  include JSON::Serializable

  getter user : User
end

struct User
  include JSON::Serializable

  getter id : UUID
  getter name : String
  getter email : String
  getter created_at : Time
end


result = client.graphql.query <<-GQL, {name: "Jamie"}, as: UserResult
  query GetUser($name: String!) {
    user(name: $name) {
      id
      name
      email
      created_at
    }
  }
GQL
user = result.data.user

In this example, the id, name, email, and created_at properties can all be expected to be non-nil. But if I didn’t ask for them in the query, they wouldn’t be in the JSON response. But if I’m writing the query and the code that consumes the User object, I feel like I should know whether I asked for it. That’s the main reason I’m considering getter!.

This feels a lot like the SELECT clause of a SQL query and mapping the result to a language-level object, but I think for that most folks use ORMs which simply select all the columns of a table or all the ivars of the object because they also generate the query for you.

Yeah I think I understand the use case. My comment was about the scenario that you want to re-use the same resource type for several queries which may select different properties, so not all ivars may be populated all the time.
That’s what I gather from your problem description. If instead you only have a 1:1 mapping between query and resource class and the query code is static, there’s no issue really, right? That would be option 3 from the OP.

I’m not sure you do understand the use case, then. There are multiple tradeoffs associated with that approach.

One of them is that the person writing the query needs to write every type returned in the response. There could be a quite a few types involved in just one query, especially when you have edges { node { ... } } for every connection between entities, which seems to be common with GraphQL. One of my simpler GraphQL queries involves 7 types in the result.

Another is that such an explosion of types has a huge impact on compile times. If you have 500 distinct GraphQL queries throughout your application, that could realistically add 5000 or more JSON::Serializable types to compile unless you are very diligent about finding and reusing them, which comes with its own tradeoffs. The compilation times associated with that can be illustrated with this code:

require "json"

abstract struct Model
  include JSON::Serializable

  macro define(type, *ivars)
    struct {{type}} < Model
      {% for ivar in ivars %}
        getter {{ivar}}
      {% end %}
    end
  end

  macro create_for_benchmark(type, count)
    {{@type}}.define {{type}},
    {% for i in 1..count %}
      var{{i}} : String{% if i < count %},{% end %}
    {% end %}
  end
end

{% for i in 1..env("MODEL_COUNT").to_i %}
  Model.create_for_benchmark Foo{{i}}, {{env("IVAR_COUNT").to_i}}
{% end %}

# Ensure the compiler compiles all of our objects' `initialize(JSON::PullParser)` methods
{% begin %}
  json = {
    {% for i in 1..env("IVAR_COUNT").to_i %}
      var{{i}}: "asdf",
    {% end %}
  }.to_json

  pp ({{(1..env("MODEL_COUNT").to_i).map { |i| "Foo#{i}" }.join(" | ").id}}).from_json(json)
{% end %}

Comparing compilation times of various permutations of number of types and average number of ivars per type:

# 10 ivars per type
MODEL_COUNT=10 IVAR_COUNT=10 crystal build -s examples/graphql_compile_time.c  2.28s user 0.32s system 167% cpu 1.556 total
MODEL_COUNT=100 IVAR_COUNT=10 crystal build -s  -o example  3.22s user 0.43s system 151% cpu 2.404 total
MODEL_COUNT=500 IVAR_COUNT=10 crystal build -s  -o example  18.27s user 0.94s system 187% cpu 10.251 total

# 20 ivars per type
MODEL_COUNT=10 IVAR_COUNT=20 crystal build -s examples/graphql_compile_time.c  2.85s user 0.44s system 196% cpu 1.673 total
MODEL_COUNT=100 IVAR_COUNT=20 crystal build -s  -o example  7.30s user 0.50s system 155% cpu 5.002 total
MODEL_COUNT=500 IVAR_COUNT=20 crystal build -s  -o example  35.89s user 1.27s system 167% cpu 22.131 total

# 30 ivars per type
MODEL_COUNT=10 IVAR_COUNT=30 crystal build -s examples/graphql_compile_time.c  3.06s user 0.45s system 183% cpu 1.908 total
MODEL_COUNT=100 IVAR_COUNT=30 crystal build -s  -o example  10.06s user 0.58s system 133% cpu 7.975 total
MODEL_COUNT=500 IVAR_COUNT=30 crystal build -s  -o example  61.72s user 1.84s system 150% cpu 42.331 total

Memory usage on the last compilation exceeded 5GB. This is just to compile the types for deserializing query results and doesn’t count the compilation cost for anything else the program does. I couldn’t run a benchmark for 5000 model types because it crashes the compiler.

None of this is to say it’s a bad approach. But there are tradeoffs to consider and I don’t think that’s being understood.


FWIW, I don’t like GraphQL. I think it’s unnecessarily complicated even in a dynamic language like Ruby and throws away 25 years of tried-and-true ideas using REST — for example, you lose HTTP Cache-Control because everything’s a POST, so GraphQL’s solution to caching is quite literally “build your own”. Unfortunately, the only API available for some of the things I need to do uses GraphQL.

1 Like