YAML Aliases with Serializable

Hello! I am wondering if there is a nice way of parsing mostly-structured YAML config, where the outermost layer contains YAML aliases?

This is easiest to explain with an example:

require "yaml"
class Config
  include YAML::Serializable
  property foo : Int32
  property bar : Int32
end

config_per_environment = Hash(String, Config).from_yaml File.open("config.yml")
puts config_per_environment["dev"]

config.yml:

defaults: &defaults
  foo: 17

dev:
  <<: *defaults
  bar: 54

This fails to parse as defaults does not contain all the fields needed to be turned into a Config - however all I want is to use it to avoid repetition, and will never read it as a Config.

I could of course use YAML::Any to parse the whole thing, but I would like some semblance of static typing here (to avoid having to manually verify the config at every step).

Any suggestions would be appreciated!

I would just list all defaults under the default item. Why not make the defaults explicit and directly visible for anyone who edits the config file?

Ah, I didn’t mention in the original post that there are not necessarily good/valid default values for all fields in the entry, but you want to factor out all the common fields so they are not needlessly repeated.

The only way I can think of is making the missing fields in defaults be nilable.

In fact, this makes sense because if you access config_per_environment["defaults"]… what would you get? You would get a Config with just foo set, and nil for bar.

In reality I’ll never want to access the value at “defaults” - I guess what I want is the YAML parser to see that it’s incomplete (or has an alias) and ignore it for everything apart from being referenced by other nodes.

Yes, but the compiler doesn’t know that. The compiler, in a type-safe language, will make sure you handle all cases correctly. The case of accessing “defaults” is always available and so you have to model it like that.

I guess what you can do is:

  • parse the yaml with YAML.parse
  • remove the “defaults” key
  • serialize back to yaml
  • parse the config objects from that resulting yaml

If you don’t care about parsing YAML twice, here’s the code to do it:

require "yaml"

class Config
  include YAML::Serializable
  property foo : Int32
  property bar : Int32

  def self.load_hash(yaml)
    parsed_yaml = YAML.parse(yaml_definition)
    parsed_yaml.as_h.delete("defaults")
    Hash(String, Config).from_yaml(parsed_yaml.to_yaml)
  end
end

def yaml_definition
  <<-YAML
  defaults: &defaults
    foo: 17

  dev:
    <<: *defaults
    bar: 54
  YAML
end

pp Config.load_hash(yaml_definition)

The issue here is that your YAML document contains a mapping where only some values are actually Config. So does not align with the default serialization of a hash datastructure.

One way to express that is to let the hash’s value type also accept YAML::Any (that will parse whatever yaml value) and filter them out afterwards:

require "yaml"

class Config
  include YAML::Serializable
  property foo : Int32
  property bar : Int32

end

io = IO::Memory.new(<<-YAML)
  defaults: &defaults
    foo: 17

  dev:
    <<: *defaults
    bar: 54
  YAML

config_per_environment = Hash(String, Config | YAML::Any).from_yaml(io).select{|k,v| v.is_a?(Config)}

puts config_per_environment["dev"]

That’s certainly much more efficient than serialzing back and forth.

A more opimized version would be a custom deserializer which works similar to Hash.from_yaml but simply skips all mapping entries that don’t parse as the value type (crude implementation: https://carc.in/#/r/8ra3).

We could consider adding some implementation for this to the stdlib, because I think it might be generally useful.

Thanks everyone for your help! I think the partially-parsing-hash that @straight-shoota made is what I’m looking for. I’ll see if I can put it into a shard that makes some kind of sense, and share that here.