Converting a YAML doc

I am trying to materialize all paths that occur in an arbitrarily-deep document, with no known keys.

Here’s an example on play.crystal-lang.org
https://play.crystal-lang.org/#/r/a7ey

---
Sports:
- Footall:
   - Trophy:
     - Vince Lombardi
   - Teams:
     - Packers:
       - QB:
         - Aaron Rogers
     - Vikings:
       - QB:
         - Kirk Cousins
         - Sean Mannion
- Hockey:
   - Teams:
     - Bruins
     - Canadiens
   - Trophy:
     - Stanley Cup

Desired output:

[["Sports", "Footall", "Trophy", "Vince Lombardi"],
 ["Sports", "Footall", "Teams", "Packers", "QB", "Aaron Rogers"],
 ["Sports", "Footall", "Teams", "Vikings", "QB", "Kirk Cousins"],
 ["Sports", "Footall", "Teams", "Vikings", "QB", "Sean Mannion"],
 ["Sports", "Hockey", "Teams", "Bruins"],
 ["Sports", "Hockey", "Teams", "Canadiens"],
 ["Sports", "Hockey", "Trophy", "Stanley Cup"]]

Here’s what I came up with:

def recurse_yaml(input : YAML::Any, path = [] of String, result = [] of Array(String)) : Array(Array(String))
  if (hash_input = input.as_h?)
    hash_input.each do |key, val|
      recurse_yaml(val, path + [key.as_s], result)
    end
  elsif (array_input = input.as_a?)
    array_input.each do |el|
      recurse_yaml(el, path, result)
    end 
  elsif (string_input = input.as_s?)
    result << (path + [string_input])
  end
 
  result
end

I hate that there’s so much logic, and looping in some of the if conditions. This just doesn’t seem right. Is there a better way to accomplish this?

Thank you, and I absolutely love the language, thank you all for an absolutely amazing bit of tech!

Define a struct to match your requirement. and then convert yaml to struct using Sports.from_yaml(yamldata). Refer: https://crystal-lang.org/api/0.35.1/YAML/Serializable.html

I don’t think there’s any way to simplify the implementation significantly. Traversing a YAML doc tree is not a generic algorithm as it very much depends on specifics. For example, you’re just skipping over sequences, others might want to incorporate element indices.

@aravindavk
Defining a custom data type would not be particularly helpful to reduce the code for iterating the data.
Furthermore, based on the problem description structure of the data is unknown (“arbitrarily-deep document”).

1 Like

I see two possible variations, similar to yours:

def recurse_yaml(input : YAML::Any, path = [] of String, result = [] of Array(String)) : Array(Array(String))
  case raw = input.raw
  when Hash
    raw.each do |key, val|
      recurse_yaml(val, path + [key.as_s], result)
    end
  when Array
    raw.each do |el|
      recurse_yaml(el, path, result)
    end
  when String
    result << (path + [raw])
  end

  result
end

p recurse_yaml any
struct S
  getter result = Array(Array(String)).new

  def initialize(any)
    parse any.raw
  end

  private def parse(h : Hash, path = Array(String).new)
    h.each { |k, v| parse v.raw, path + [k.as_s] }
  end

  private def parse(a : Array, path = Array(String).new)
    a.each { |e| parse e.raw, path }
  end

  private def parse(s : String, path = Array(String).new)
    @result << path + [s]
  end

  private def parse(*x)
  end
end

p S.new(any).result

It is essentially overload vs case/when dispatch.
I think I prefer the struct solution, because it is obvious that only the parse(String) overload mutates the result, and each method is very simple.

1 Like

I love the struct solution it is much more readable. Thank you!

1 Like