How to go about creating a YAML transformer

On one of our projects we use YAMLs to store configs, but we want to update the config structure. To retain some backwards compatibility we wish to create a transformer that would convert v1 config to a v2 config.

We use the from_yaml method to store the YAML data in a class, this is true for both v1 and v2. Unfortunately due to various circumstances it is not possible to do a class-to-class transformation and we need to do something like this:

  1. Load v1 config into v1 class using from_yaml.
  2. Traverse the nested class to create a new temporary Hash-like* object.
  3. Serialize the Hash-like object to String/file.
  4. The String/file gets parsed by v2 config class.

* Hash-like object because I am unsure what the best temporary data structure is. Additionally the Hash-like object should allow for arbitrary amount of nesting, so it needs to be some recursive type.

And obviously we want as generic a solution as possible.

What I have:

  1. Function that allows for traversal of nested class based on a list of keys.
      def traverse_param(*params : Symbol)
        traverse_param(params.to_a)
      end
    
      def traverse_param(params : Array(Symbol))
        some_code
        returns value at the end of key chain
      end
  1. Transformation rules, these are in this format:
@rules = [] of NamedTuple(old_key: Array(Symbol), new_key: Array(Symbol))

...
{old_key: [:white_list_container_names], new_key: [:common_parameters, :white_list_container_names]},
{old_key: [:docker_insecure_registries], new_key: [:common_parameters, :docker_insecure_registries]},
{old_key: [:image_registry_fqdns], new_key: [:common_parameters, :image_registry_fqdns]},
...
  1. A recursive type, I know that recursive aliases are not looked upon fondly, but I thought that it could work.
alias NestedHash = Hash(String, String | Array(String) | Hash(String, NestedHash))
@new_config : NestedHash
  1. A transformation function, since its not too long I will post it in its entirety:
      def transform
        @transformation_rules.each do |rule|
          old_key = rule[:old_key]
          new_key = rule[:new_key]
    
          value = @old_config.traverse_param(old_key)
          unless value.nil?
            insert_into_new_config(new_key, value.as(String | Array(String)))
          end
        end
      end

      private def insert_into_new_config(new_key : Array(Symbol), value : (String | Array(String)))
        # Start at the top level of @new_config
        current_level = @new_config

        new_key.each_with_index do |key, index|
          key_string = key.to_s

          if index == new_key.size - 1
            current_level[key_string] = value
          else
            current_level[key_string] = {} of String => Hash(String, NestedHash)
            current_level = current_level[key_string]
          end
        end
      end

Issues:
I should preface this with the fact that the insert_new_config function does cause the compiler to throw an exception:

crystal version
Crystal 1.6.2 [879691b2e] (2022-11-03)

LLVM: 13.0.1
Default target: x86_64-unknown-linux-gnu
crystal build src/cnf-testsuite.cr
current_branch during compile: "config_transformer"
current_tag during compile: 
Invalid memory access (signal 11) at address 0x7fff76e89fe8
[0xeb6876] ???
[0xeb683d] ???
[0x3753238] ???

The transformation rules, traversal function and their combined use in the transform function does work on its own (that is without insert_new_config).

The strict typing of crystal is giving me a really hard time. I genuinely cannot figure out how to make the assignment in current_level[key_string] = value work. No amount of different type casting / as makes it function. It is likely that I am doing something very wrong.

Any ideas are welcome.

Would it be easier to skip step 2 and 3 to leverage YAML::Builder - Crystal 1.13.1 to build out the YAML directly based on the old config and transformation rules?

I’m a bit lost on why a class-to-class transformation can’t happen, even if it’s not direct :thinking:. Given that you know what data should be transformed into the new format, you should be able to build a serializable type that omits keys that may not appear in one config but do in another. These can even be included in the serialized YAML file using @[YAML::Field(emit_null: true)] if you so wish.

Now that you say it, that does sound like a viable solution. The issue with it is the duplication of code that would ensue. We did not want to transform class-to-class (that is config v1 class to config v2 class) because it would somewhat break the initialization of config v2 class as well as remove the validation provided by YAML::Serializable::Strict. Additionally I wished for this solution to be generic (just define the rule-set which is easily readable and transform from anything to anything), having the in-between type structure pre-defined would be somewhat problematic in case someone was going to create a new rule set in the future and had to do all that. There is also the issue with optional keys being unused and how to resolve that. I am well aware that this does not really make sense, but we just want to make the future extendibility as easy as possible.

This seems to be the correct approach.