Hello! I am wondering if there is a nice way of parsing mostly-structured YAML config, where the outermost layer contains YAML aliases?
This is easiest to explain with an example:
require "yaml"
class Config
include YAML::Serializable
property foo : Int32
property bar : Int32
end
config_per_environment = Hash(String, Config).from_yaml File.open("config.yml")
puts config_per_environment["dev"]
config.yml:
defaults: &defaults
foo: 17
dev:
<<: *defaults
bar: 54
This fails to parse as defaults does not contain all the fields needed to be turned into a Config - however all I want is to use it to avoid repetition, and will never read it as a Config.
I could of course use YAML::Any to parse the whole thing, but I would like some semblance of static typing here (to avoid having to manually verify the config at every step).
Ah, I didn’t mention in the original post that there are not necessarily good/valid default values for all fields in the entry, but you want to factor out all the common fields so they are not needlessly repeated.
The only way I can think of is making the missing fields in defaults be nilable.
In fact, this makes sense because if you access config_per_environment["defaults"]… what would you get? You would get a Config with just foo set, and nil for bar.
In reality I’ll never want to access the value at “defaults” - I guess what I want is the YAML parser to see that it’s incomplete (or has an alias) and ignore it for everything apart from being referenced by other nodes.
Yes, but the compiler doesn’t know that. The compiler, in a type-safe language, will make sure you handle all cases correctly. The case of accessing “defaults” is always available and so you have to model it like that.
I guess what you can do is:
parse the yaml with YAML.parse
remove the “defaults” key
serialize back to yaml
parse the config objects from that resulting yaml
If you don’t care about parsing YAML twice, here’s the code to do it:
require "yaml"
class Config
include YAML::Serializable
property foo : Int32
property bar : Int32
def self.load_hash(yaml)
parsed_yaml = YAML.parse(yaml_definition)
parsed_yaml.as_h.delete("defaults")
Hash(String, Config).from_yaml(parsed_yaml.to_yaml)
end
end
def yaml_definition
<<-YAML
defaults: &defaults
foo: 17
dev:
<<: *defaults
bar: 54
YAML
end
pp Config.load_hash(yaml_definition)
The issue here is that your YAML document contains a mapping where only some values are actually Config. So does not align with the default serialization of a hash datastructure.
One way to express that is to let the hash’s value type also accept YAML::Any (that will parse whatever yaml value) and filter them out afterwards:
require "yaml"
class Config
include YAML::Serializable
property foo : Int32
property bar : Int32
end
io = IO::Memory.new(<<-YAML)
defaults: &defaults
foo: 17
dev:
<<: *defaults
bar: 54
YAML
config_per_environment = Hash(String, Config | YAML::Any).from_yaml(io).select{|k,v| v.is_a?(Config)}
puts config_per_environment["dev"]
That’s certainly much more efficient than serialzing back and forth.
A more opimized version would be a custom deserializer which works similar to Hash.from_yaml but simply skips all mapping entries that don’t parse as the value type (crude implementation: https://carc.in/#/r/8ra3).
We could consider adding some implementation for this to the stdlib, because I think it might be generally useful.
Thanks everyone for your help! I think the partially-parsing-hash that @straight-shoota made is what I’m looking for. I’ll see if I can put it into a shard that makes some kind of sense, and share that here.