I have the following snippet, which doesn’t compile.
It can be a bug in the compiler, or it might be that I lack some information about the type system.
The concept is to have a parser, which can postprocess the tokens. Each token would store a Proc, which is called every time the token is successfully parsed. The Proc receives the parsed text (String or Char) as argument, and returns a ParseResult.
This snippet is extracted from a larger codebase of a parser combinator, for demonstrating the issue.
I know that generics could be used for this snippet (make GrammarToken a generic class), but in the large codebase it would reduce flexibility when using the library.
Have I found a bug in the Crystal compiler, or I messed up something?
edit: this is an incomplete code snippet, with an unrelated error, see the complete version below in the comments.
# Result of postprocessing
class AstNode
end
# Parse result can either be postprocessed or raw
alias ParseResult = AstNode|Char|String
abstract class GrammarToken
# This is a getter, but it's not known
# what type the @postprocessor variable
# will have in subclasses, so it's abstract
# It will have type of `Proc(T, ParseResult)`
# `T` is the return type of `#read`
abstract def get_postprocessor
# The return type is not known yet
# If it would be union of all possible tokens types
# (this is what `ParseResult` is for),
# then it would compile, but the user-specified post processor block
# would have to type cast it most of the time to a stricter type
abstract def read(reader : Char::Reader)
def parse(reader : Char::Reader) : ParseResult
result = parse reader
get_postprocessor.call result
end
end
# Reads a Char
class CharToken < GrammarToken
def initialize(&block : Char -> ParseResult)
@postprocessor = block
end
def get_postprocessor
@postprocessor
end
# The return type is `Char`, and the first
# argument of `@postprocessor` is also `Char`
def read(reader : Char::Reader) : Char
reader.next_char
end
end
# Reads two Chars and converts to String
class StringToken < GrammarToken
def initialize(&block : String -> ParseResult)
@postprocessor = block
end
def get_postprocessor
@postprocessor
end
# The return type is `String`, and the first
# argument of `@postprocessor` is also `String`
def read(reader : Char::Reader) : String
"#{reader.next_char}#{reader.next_char}"
end
end
char_token = CharToken.new do |i|
i
end
string_token = StringToken.new do |i|
i
end
array = [] of ParseResult
reader = Char::Reader.new "abcde"
array << char_token.parse reader
array << string_token.parse reader
puts array
Error isn’t the clearest but I think your issue was you had result = parse reader and not result = read reader. This was causing it to fail since it was trying to use a ParseResult as the argument to the postprocessor proc instead of a Char. Changing that makes it compile and print ['b', "bc"] which seems to be the expected output.
@Blacksmoke16 Thanks for the help, it indeed fixed this snippet. However, in the larger codebase this wasn’t the issue. I extracted some more code which can reproduce the exact same error message.
Turns out, adding a VariationsToken can reproduce the error. This token takes two GrammarToken instances, and tries to read the LHS. If it fails, then read RHS. If it succeeds, return RHS, otherwise the whole token fails. It works similarly to this:
variations = a | b
The error is about overloads not matching. But I don’t know where the errorneous overload is instantiated.
The CharToken postprocessor (accepts Char) seems to get called with the VariationsToken results which is ParseResult. But the code never calls it.
Is it something that happens when instantiating abstract class methods, or something similar, caused by type inference? I specified return types wherever I could, to help avoid wrong type inference.
Error message:
# Result of postprocessing
class AstNode
@content : String
def initialize(@content)
end
end
class ParseException < Exception
end
alias ParseResult = AstNode|Char|String
abstract class GrammarToken
# This is a getter, but it's not known
# what type the @postprocessor variable
# will have in subclasses, so it's abstract
# It will have type of `Proc(T, ParseResult)`
# `T` is the return type of `#read`
abstract def get_postprocessor
# The return type is not known yet
abstract def read(reader : Char::Reader)
def parse(reader : Char::Reader) : ParseResult
result = read reader
return get_postprocessor.call result
end
end
# Reads a Char
class CharToken < GrammarToken
@grammar = 'a'
def initialize(&block : Char -> ParseResult)
@postprocessor = block
end
def get_postprocessor
@postprocessor
end
# The return type is `Char`, and the first
# argument of `@postprocessor` is also `Char`
def read(reader : Char::Reader) : Char
next_char = reader.next_char
if @grammar == next_char
return next_char
else
raise ParseException.new
end
end
end
# Reads two Chars and converts to String
class StringToken < GrammarToken
@grammar = "bc"
def initialize(&block : String -> ParseResult)
@postprocessor = block
end
def get_postprocessor
@postprocessor
end
# The return type is `String`, and the first
# argument of `@postprocessor` is also `String`
def read(reader : Char::Reader) : String
next_str = "#{reader.next_char}#{reader.next_char}"
if @grammar == next_str
return next_str
else
raise ParseException.new
end
end
end
class VariationsToken < GrammarToken
@postprocessor : Proc(ParseResult, ParseResult)
def get_postprocessor
@postprocessor
end
def initialize(@lhs : GrammarToken, @rhs : GrammarToken, &block : ParseResult -> ParseResult)
@postprocessor = block
end
def read(reader : Char::Reader) : ParseResult
begin
return @lhs.parse reader
rescue exception : ParseException
return @rhs.parse reader
end
end
end
char_token = CharToken.new do |i|
i
end
string_token = StringToken.new do |i|
AstNode.new i
end
variations = VariationsToken.new string_token, char_token do |selected|
selected
end
array = [] of ParseResult
reader = Char::Reader.new "abcde"
array << char_token.parse reader
array << string_token.parse reader
array << variations.parse reader
puts array
The main issue is that get_postprocessor returns Proc(Char, ...) in one case, Proc(String, ...) in another, but the result of read when you combine multiple parsers is any possible result type (Char | String | ParseResult) and you can’t pass such a type to Proc(Char, ...).
class StringToken < GrammarToken(String, ParseResult)
class VariationsToken(L, R) < GrammarToken(ParseResult, ParseResult). Where L and R are used to type the lhs and rhs ivars, def initialize(@lhs : L, @rhs : R, &block : ParseResult -> ParseResult).
Where you also define each get_postprocessor with the correct return type: def get_postprocessor : Proc(Char, ParseResult).
Making that change also allows it to compile. Tho not sure what the pros/cons between the two would be. Ultimately @asterite’s solution would prob be simpler as you avoid the issues you could run into when using generics.
I know that multiple parsers combined is the union of any possible result type. But only VariationsToken combines multiple parsers, and its postprocessor accepts ParseResult. This code never calls CharToken postprocessor with VariationsToken results.
Is there some logic in the compiler that combines these types for the abstract class, even if these are not called directly?
Another question:
The original reason to not have ParseResult as the argument type of the postprocessor is because if it was, then the user defined postprocessor blocks would have to do type casting.
In the larger codebase ParseResult includes more types, e.g. arrays.
For example, to repeat a given token, then concatenate the results into one item:
concat_token = RepeatToken.new char_token do |result|
# Compile error: #join is not defined for `Char`
result.join
# Compiles, but it's less readable, and also unnecessary
# because RepeatToken always parses into an array
result.as(Array(ParseResult)).join
end
I tested it, and it still requires type casting. I think the only solution could be to figure out why does the CharToken postprocessor getting called with the VariationsToken postprocessor arguments.
Because @lhs can be any GrammarToken, the compiler tries to type parse. What’s parse? This:
def parse(reader : Char::Reader) : ParseResult
result = read reader
return get_postprocessor.call result
end
And here, for a GrammarToken, what’s the type of get_postprocessor? Well, it can be any of the subtypes get_postprocessor… and in that case we can’t pass result.
Interestingly, if you define parse in each GrammarToken then it works. The reason is that the dispatch will now hit parse on every concrete type, and calling get_postprocessor on a concrete type will give you the type that’s needed.
This could be seen as a limitation of the language. But changing this would already introduce more bloat than what currently exists (in the generated code/binary.)
Does this mean that if a method is defined in the parent class, then the return type will be inferred to be the union of the return type in each subclass? I assumed that in this case, instead of combining the types, a separate method instance will be created for every subclass where it differs.
There is one more thing I don’t understand: in the original snippet, without VariationsToken (only CharToken and StringToken), it wasn’t an issue despite both token types have different argument types for the postprocessor (Char, and String).
Yes, if the type you have is the combination of several types (the abstract GrammarType) here.
Right, because in that case you never had an instance variable of type GrammarType, which means “any possible grammar type”. You always had concrete types like CharToken and StringToken.
All of this said, I think the generics approach shown by @Blacksmoke16 is probably the best way to do it.
Another solution could be to set the @postprocessor variable to Proc(Char, ParseResult)|Proc(String, ParseResult)|Proc(AstNode, ParseResult), and use the AstNode|Char|String union as the return type of read, then type cast them to a given signature, e.g.
def parse(reader) : ParseResult
result = read reader
runtime_type = typeof(result)
get_postprocessor.as(Proc(runtime_type, ParseResult)).call result.as(runtime_type)
end
(this snippet doesn’t work, it’s just a draft)
Is it possible in Crystal somehow? How to determine the the type of a variable using runtime reflection?
Generics are unfortunately too inflexible. For example, it is not possible to pass Array(String) argument to a method that accepts Array(String|Int32). This would cause lot of complications when creating larger tokens combined of several smaller. Currently, every token is GrammarToken, but with generics, it would not be possible to build a token sequence operator where the two sides are not equally complex, e.g. the LHS is GrammarToken(String), and RHS is GrammarToken(GrammarToken(String|Char)), because type casting would not work.
You deff can do this. However if you pass Array(String) to a constructor that is expecting Array(Char|String) youd have to handle that. E.g. @char_or_string_array = string_array.map &.as(String|Char).
Could try adding a non generic parent abstract class. E.g. something like abstract class Token; end then abstract class GrammarToken(K, V) < Token. Not sure how that would play out tho, but it would allow you to have like @lhs : Token which could be any token.
Upon further investigation, this turned out to be the solution to my problem.
The method had to be wrapped in the macro inherited hook to make it work.
No generics mess, and no type casting needed. Just 2 lines of code.
# original
def parse(reader : Char::Reader) : ParseResult
result = read reader
return get_postprocessor.call result
end
# fixed
macro inherited
def parse(reader : Char::Reader) : ParseResult
result = read reader
return get_postprocessor.call result
end
end
I am marking the quoted comment as solution, because it contains the information needed to solve my issue. However, @Blacksmoke16 also gave me some ideas regarding generics, which I will investigate more, for different use cases.