Type system issues

I have the following snippet, which doesn’t compile.
It can be a bug in the compiler, or it might be that I lack some information about the type system.

The concept is to have a parser, which can postprocess the tokens. Each token would store a Proc, which is called every time the token is successfully parsed. The Proc receives the parsed text (String or Char) as argument, and returns a ParseResult.
This snippet is extracted from a larger codebase of a parser combinator, for demonstrating the issue.
I know that generics could be used for this snippet (make GrammarToken a generic class), but in the large codebase it would reduce flexibility when using the library.

Have I found a bug in the Crystal compiler, or I messed up something?

edit: this is an incomplete code snippet, with an unrelated error, see the complete version below in the comments.

# Result of postprocessing
class AstNode
end

# Parse result can either be postprocessed or raw
alias ParseResult = AstNode|Char|String

abstract class GrammarToken
  # This is a getter, but it's not known
  # what type the @postprocessor variable
  # will have in subclasses, so it's abstract
  # It will have type of `Proc(T, ParseResult)`
  # `T` is the return type of `#read`
  abstract def get_postprocessor

  # The return type is not known yet
  # If it would be union of all possible tokens types
  # (this is what `ParseResult` is for),
  # then it would compile, but the user-specified post processor block
  # would have to type cast it most of the time to a stricter type
  abstract def read(reader : Char::Reader)

  def parse(reader : Char::Reader) : ParseResult
    result = parse reader
    get_postprocessor.call result
  end
end

# Reads a Char
class CharToken < GrammarToken
  def initialize(&block : Char -> ParseResult)
    @postprocessor = block
  end

  def get_postprocessor
    @postprocessor
  end

  # The return type is `Char`, and the first
  # argument of `@postprocessor` is also `Char`
  def read(reader : Char::Reader) : Char
    reader.next_char
  end
end

# Reads two Chars and converts to String
class StringToken < GrammarToken
  def initialize(&block : String -> ParseResult)
    @postprocessor = block
  end

  def get_postprocessor
    @postprocessor
  end

  # The return type is `String`, and the first
  # argument of `@postprocessor` is also `String`
  def read(reader : Char::Reader) : String
    "#{reader.next_char}#{reader.next_char}"
  end
end

char_token = CharToken.new do |i|
  i
end

string_token = StringToken.new do |i|
  i
end

array = [] of ParseResult

reader = Char::Reader.new "abcde"
array << char_token.parse reader
array << string_token.parse reader

puts array

Error isn’t the clearest but I think your issue was you had result = parse reader and not result = read reader. This was causing it to fail since it was trying to use a ParseResult as the argument to the postprocessor proc instead of a Char. Changing that makes it compile and print ['b', "bc"] which seems to be the expected output.

@Blacksmoke16 Thanks for the help, it indeed fixed this snippet. However, in the larger codebase this wasn’t the issue. I extracted some more code which can reproduce the exact same error message.
Turns out, adding a VariationsToken can reproduce the error. This token takes two GrammarToken instances, and tries to read the LHS. If it fails, then read RHS. If it succeeds, return RHS, otherwise the whole token fails. It works similarly to this:

variations = a | b

The error is about overloads not matching. But I don’t know where the errorneous overload is instantiated.
The CharToken postprocessor (accepts Char) seems to get called with the VariationsToken results which is ParseResult. But the code never calls it.
Is it something that happens when instantiating abstract class methods, or something similar, caused by type inference? I specified return types wherever I could, to help avoid wrong type inference.
Error message:

bug.cr:69:21: 

 69 | array << char_token.parse reader
                          ^----
Error: instantiating 'CharToken#parse(Char::Reader)'


bug.cr:20:23: 

 20 | get_postprocessor.call result
                        ^---
Error: no overload matches 'Proc(Char, (AstNode | Char | String))#call' with type (AstNode | Char | String)

Overloads are:
 - Proc(*T, R)#call(*args : *T)

Code:

# Result of postprocessing
class AstNode
  @content : String
  def initialize(@content)
  end
end

class ParseException < Exception
end

alias ParseResult = AstNode|Char|String

abstract class GrammarToken
  # This is a getter, but it's not known
  # what type the @postprocessor variable
  # will have in subclasses, so it's abstract
  # It will have type of `Proc(T, ParseResult)`
  # `T` is the return type of `#read`
  abstract def get_postprocessor

  # The return type is not known yet
  abstract def read(reader : Char::Reader)

  def parse(reader : Char::Reader) : ParseResult
    result = read reader
    return get_postprocessor.call result
  end
end

# Reads a Char
class CharToken < GrammarToken
  @grammar = 'a'

  def initialize(&block : Char -> ParseResult)
    @postprocessor = block
  end

  def get_postprocessor
    @postprocessor
  end

  # The return type is `Char`, and the first
  # argument of `@postprocessor` is also `Char`
  def read(reader : Char::Reader) : Char
    next_char = reader.next_char
    if @grammar == next_char
      return next_char
    else
      raise ParseException.new
    end
  end
end

# Reads two Chars and converts to String
class StringToken < GrammarToken
  @grammar = "bc"

  def initialize(&block : String -> ParseResult)
    @postprocessor = block
  end

  def get_postprocessor
    @postprocessor
  end

  # The return type is `String`, and the first
  # argument of `@postprocessor` is also `String`
  def read(reader : Char::Reader) : String
    next_str = "#{reader.next_char}#{reader.next_char}"
    if @grammar == next_str
      return next_str
    else
      raise ParseException.new
    end
  end
end

class VariationsToken < GrammarToken
  @postprocessor : Proc(ParseResult, ParseResult)

  def get_postprocessor
    @postprocessor
  end

  def initialize(@lhs : GrammarToken, @rhs : GrammarToken, &block : ParseResult -> ParseResult)
    @postprocessor = block
  end

  def read(reader : Char::Reader) : ParseResult
    begin
      return @lhs.parse reader
    rescue exception : ParseException
      return @rhs.parse reader
    end
  end
end

char_token = CharToken.new do |i|
  i
end

string_token = StringToken.new do |i|
  AstNode.new i
end

variations = VariationsToken.new string_token, char_token do |selected|
  selected
end

array = [] of ParseResult

reader = Char::Reader.new "abcde"

array << char_token.parse reader
array << string_token.parse reader
array << variations.parse reader

puts array

I didn’t read the entire discussion, but it’s always helpful to include the error when saying “I got an error”.

Thanks for the suggestion. I edited the above comment to include the error message.

The main issue is that get_postprocessor returns Proc(Char, ...) in one case, Proc(String, ...) in another, but the result of read when you combine multiple parsers is any possible result type (Char | String | ParseResult) and you can’t pass such a type to Proc(Char, ...).

If you force get_postprocessor to have this type:

  abstract def get_postprocessor : ParseResult -> ParseResult

and implement it like this for CharToken:

  def get_postprocessor : ParseResult -> ParseResult
    ->(result : ParseResult) { @postprocessor.call(result.as(Char)) }
  end

(and similarly for StringToken)

then it compiles fine.

Another approach would be to use generics. E.g. something like:

abstract class GrammarToken(K, V)
  abstract def get_postprocessor : Proc(K, V)

  # ...

Then have:

  1. class CharToken < GrammarToken(Char, ParseResult)
  2. class StringToken < GrammarToken(String, ParseResult)
  3. class VariationsToken(L, R) < GrammarToken(ParseResult, ParseResult). Where L and R are used to type the lhs and rhs ivars, def initialize(@lhs : L, @rhs : R, &block : ParseResult -> ParseResult).

Where you also define each get_postprocessor with the correct return type: def get_postprocessor : Proc(Char, ParseResult).

Making that change also allows it to compile. Tho not sure what the pros/cons between the two would be. Ultimately @asterite’s solution would prob be simpler as you avoid the issues you could run into when using generics.

I know that multiple parsers combined is the union of any possible result type. But only VariationsToken combines multiple parsers, and its postprocessor accepts ParseResult. This code never calls CharToken postprocessor with VariationsToken results.
Is there some logic in the compiler that combines these types for the abstract class, even if these are not called directly?

Another question:
The original reason to not have ParseResult as the argument type of the postprocessor is because if it was, then the user defined postprocessor blocks would have to do type casting.
In the larger codebase ParseResult includes more types, e.g. arrays.
For example, to repeat a given token, then concatenate the results into one item:

concat_token = RepeatToken.new char_token do |result|
  # Compile error: #join is not defined for `Char`
  result.join

  # Compiles, but it's less readable, and also unnecessary
  # because RepeatToken always parses into an array
  result.as(Array(ParseResult)).join
end

Does this solution solve this problem too?

I tested it, and it still requires type casting. I think the only solution could be to figure out why does the CharToken postprocessor getting called with the VariationsToken postprocessor arguments.

Looking at the full error trace:

In foo.cr:112:12

 112 | variations.parse reader
                  ^----
Error: instantiating 'VariationsToken#parse(Char::Reader)'


In foo.cr:26:14

 26 | result = read reader
               ^---
Error: instantiating 'read(Char::Reader)'


In foo.cr:92:19

 92 | return @lhs.parse reader
                  ^----
Error: instantiating 'GrammarToken+#parse(Char::Reader)'


In foo.cr:27:30

 27 | return get_postprocessor.call result
                               ^---
Error: no overload matches 'Proc(Char, (AstNode | Char | String))#call' with type (AstNode | Char | String)

Overloads are:
 - Proc(*T, R)#call(*args : *T)

The relevant part of the error seems to be this:

In foo.cr:92:19

 92 | return @lhs.parse reader
                  ^----
Error: instantiating 'GrammarToken+#parse(Char::Reader)'

Because @lhs can be any GrammarToken, the compiler tries to type parse. What’s parse? This:

  def parse(reader : Char::Reader) : ParseResult
    result = read reader
    return get_postprocessor.call result
  end

And here, for a GrammarToken, what’s the type of get_postprocessor? Well, it can be any of the subtypes get_postprocessor… and in that case we can’t pass result.

Interestingly, if you define parse in each GrammarToken then it works. The reason is that the dispatch will now hit parse on every concrete type, and calling get_postprocessor on a concrete type will give you the type that’s needed.

1 Like

This could be seen as a limitation of the language. But changing this would already introduce more bloat than what currently exists (in the generated code/binary.)

Does this mean that if a method is defined in the parent class, then the return type will be inferred to be the union of the return type in each subclass? I assumed that in this case, instead of combining the types, a separate method instance will be created for every subclass where it differs.

There is one more thing I don’t understand: in the original snippet, without VariationsToken (only CharToken and StringToken), it wasn’t an issue despite both token types have different argument types for the postprocessor (Char, and String).

Yes, if the type you have is the combination of several types (the abstract GrammarType) here.

Right, because in that case you never had an instance variable of type GrammarType, which means “any possible grammar type”. You always had concrete types like CharToken and StringToken.

All of this said, I think the generics approach shown by @Blacksmoke16 is probably the best way to do it.

1 Like

Another solution could be to set the @postprocessor variable to Proc(Char, ParseResult)|Proc(String, ParseResult)|Proc(AstNode, ParseResult), and use the AstNode|Char|String union as the return type of read, then type cast them to a given signature, e.g.

def parse(reader) : ParseResult
  result = read reader
  runtime_type = typeof(result)
  get_postprocessor.as(Proc(runtime_type, ParseResult)).call result.as(runtime_type)
end

(this snippet doesn’t work, it’s just a draft)

Is it possible in Crystal somehow? How to determine the the type of a variable using runtime reflection?

I just wouldn’t go in that directions. It seems generics is what’s needed here for type safety and less code.

Generics are unfortunately too inflexible. For example, it is not possible to pass Array(String) argument to a method that accepts Array(String|Int32). This would cause lot of complications when creating larger tokens combined of several smaller. Currently, every token is GrammarToken, but with generics, it would not be possible to build a token sequence operator where the two sides are not equally complex, e.g. the LHS is GrammarToken(String), and RHS is GrammarToken(GrammarToken(String|Char)), because type casting would not work.

You deff can do this. However if you pass Array(String) to a constructor that is expecting Array(Char|String) youd have to handle that. E.g. @char_or_string_array = string_array.map &.as(String|Char).

Could try adding a non generic parent abstract class. E.g. something like abstract class Token; end then abstract class GrammarToken(K, V) < Token. Not sure how that would play out tho, but it would allow you to have like @lhs : Token which could be any token.

1 Like

Upon further investigation, this turned out to be the solution to my problem.
The method had to be wrapped in the macro inherited hook to make it work.
No generics mess, and no type casting needed. Just 2 lines of code.

# original
def parse(reader : Char::Reader) : ParseResult
  result = read reader
  return get_postprocessor.call result
end

# fixed
macro inherited
  def parse(reader : Char::Reader) : ParseResult
    result = read reader
    return get_postprocessor.call result
  end
end

I am marking the quoted comment as solution, because it contains the information needed to solve my issue. However, @Blacksmoke16 also gave me some ideas regarding generics, which I will investigate more, for different use cases.

Thank you @asterite and @Blacksmoke16 for your help!