Are inferred types "frozen" for crystal shards and the standard library?

I’ve started reading through some of the code in the standard library. I saw that a lot of methods already have type definitions, but I found a few cases that rely on type inference.

I was wondering if the types in the standard library and shards are treated differently to application code, or is there currently no distinction?

When I’m developing and testing an application, I’m only going to be modifying my application source code, and the standard library and any Crystal shards will never change. So would it make sense to perform a first pass on the standard library + shards, where we “lock down” the inferred types for any of these static files? (Or we could say: bake / freeze / crystallize?) And could this even been done as a pre-release step whenever a developer publishes a new Crystal shard? (E.g. Maybe a crystal tool format option that “expands” all of the inferred types and writes out new crystal files into a build directory?)

As an example, I’m looking at json/from_json.cr:11:

def Object.from_json(string_or_io)
  parser = JSON::PullParser.new(string_or_io)
  new parser
end

So it makes sense that I get this type error:

$ crystal eval 'require "json"; Array.from_json(123)'
Showing last frame. Use --error-trace for full trace.

In /usr/local/Cellar/crystal/0.31.1/src/json/pull_parser.cr:49:20

 49 | @lexer = Lexer.new input
                     ^--
Error: no overload matches 'JSON::Lexer.new' with type Int32

Overloads are:
 - JSON::Lexer.new(string : String)
 - JSON::Lexer.new(io : IO)
 - JSON::Lexer.new()

I can see that the type error is actually coming from the typed Lexer.new input inside the call to JSON::PullParser.new(string_or_io)

So if I open /usr/local/Cellar/crystal/0.31.1/src/json/pull_parser.cr:49:20 and add the type annotation:

def initialize(input : String | IO)

Then that moves the error further down the chain:

In /usr/local/Cellar/crystal/0.31.1/src/json/from_json.cr:12:29

 12 | parser = JSON::PullParser.new(string_or_io)
                                ^--
Error: no overload matches 'JSON::PullParser.new' with type Int32

Overloads are:
 - JSON::PullParser.new(input : String | IO)

And then again for /usr/local/Cellar/crystal/0.31.1/src/json/from_json.cr:12:29:

def Object.from_json(string_or_io : String | IO) : self
  parser = JSON::PullParser.new(string_or_io)
  new parser
end
$ crystal eval 'require "json"; Array.from_json(123)'
Showing last frame. Use --error-trace for full trace.

error in line 1
Error: no overload matches 'Array(T).from_json' with type Int32

Overloads are:
 - Array(T).from_json(string_or_io, &block)
 - Object.from_json(string_or_io, root : String)
 - Object.from_json(string_or_io : String | IO)

I actually find this error message much more obvious than the original error that talks about @lexer = Lexer.new input. And this would allow the standard library developers to still use type inference wherever it’s convenient. But all of the inferred types in the standard library could become expanded and “frozen”.

Is Crystal already doing this behind the scenes? If not, what would be some of the downsides?

So, Crystal is pretty unique about how it types things.

If you feed it this:

def Object.from_json(string_or_io)
  parser = JSON::PullParser.new(string_or_io)
  new parser
end

you might think, based on how other languages work (particularly Haskell or Elm, where you can omit the types of arguments and the compiler will infer them), that the compiler will infer the type of that method and conclude that string_or_io is String | IO.

However, that’s not how Crystal works.

The way Crystal works is that methods are always untyped. The compiler doesn’t know their type, and there’s no need to know their type. They only get a type, and this type is not assigned to the method but rather to an instantiation of a method (think C++ templates, or generics), when you call a method. Just then the compiler will go ahead and infer the type of the body. But argument types are never inferred.

That’s why there’s no “lock down” types from shards or from anything.

And all of this is of course related to “why it’s difficult to do incremental compilation in Crystal”.

4 Likes

Thanks for your reply! I think I am starting to understand what you mean, but I am still a bit confused, so I would like to talk about this concrete example.

Given the following source code:

# https://github.com/crystal-lang/crystal/blob/master/src/json/lexer.cr#L3-L10
abstract class JSON::Lexer
  def self.new(string : String)
    StringBased.new(string)
  end

  def self.new(io : IO)
    IOBased.new(io)
  end
# https://github.com/crystal-lang/crystal/blob/master/src/json/pull_parser.cr#L48-L49
class JSON::PullParser
  def initialize(input)
    @lexer = Lexer.new input
# https://github.com/crystal-lang/crystal/blob/master/src/json/from_json.cr#L11-L14
def Object.from_json(string_or_io)
  parser = JSON::PullParser.new(string_or_io)

And this code from an application:

# src/main.cr
require "json"
Array.from_json(123)

Crystal will fail with this type error:

Error: no overload matches 'JSON::Lexer.new' with type Int32

So Crystal needed to do perform some type inference through this chain:

  • Array.from_json(123) =>
    • Object.from_json(string_or_io) =>
      • JSON::PullParser.new(string_or_io) =>
        • @lexer = Lexer.new input
          • abstract class JSON::Lexer; self.new(string : String)
          • abstract class JSON::Lexer; self.new(string : IO)

In this case, would the Crystal type-checking algorithm have better performance if Object.from_json was given a type annotation of String | IO? E.g. Would it be able to find the type error immediately and fail faster? And in the case where there are no type conflicts, would these explicit type annotations speed up the type-checking phase?

(I guess I was assuming that explicit type annotations do help to improve performance and reduce the amount of time and memory used for type inference, but that might be totally wrong!)

So even if the current compiler / type inference algorithm doesn’t look at methods until they are called, what if there was a separate type inference algorithm that was optimized for this case, so it did know how to infer all of these method types by working backwards:

  • Object.from_json(string_or_io : String | IO) =>
    • JSON::PullParser.new(string_or_io: String | IO) =>
      • @lexer = Lexer.new input
        • abstract class JSON::Lexer; self.new(string : String)
        • abstract class JSON::Lexer; self.new(string : IO)

So:

  • If this kind of type inference engine did exist, and
  • it was able to emit code (or AST nodes) that included as much type information as possible, and
  • this pre-compilation step could be run once during the first compilation, and then cached (or even run in advance before releasing a new shard or crystal version)

Then would this help to speed up the type inference and type checking phases?

Thanks again for your time!

Typed method restrictions: Hold my tide pod!!

2 Likes

So in the lack of type annotations the compiler will just try to type the method and error if it fails (what you see when it gets to the Lexer). If there’s a type annotation the process is slightly slower: we first have to check that the types actually match those in the restriction. So adding type annotations is actually slower to compile! But I didn’t benchmark this, I know it because in one case it’s no-op and in the other there’s some matching going on. However, I believe these times are insignificant.

And yes, if you put String | IO as a type restriction then you will get the error at the Object level, not at the Lexer level.

That’s not compatible with how the current language works.

Take this example:

module Moo
  abstract def moo
end

class Foo
  include Moo

  def moo
    1
  end
end

class Bar
  include Moo

  def moo
    "hello"
  end
end

def say_moo(moo : Moo)
  moo.moo
end

puts typeof(say_moo(Foo.new))
puts typeof(say_moo(Bar.new))

The output is:

Int32
String

The type restriction moo : Moo is just that, a restriction (that’s why we call it like that). It doesn’t give moo the type Moo and let’s the compiler infer what’s the type of moo.moo from all possible instantiations. If it did that, the type you would get in both cases is Int32 | String.

I’m not saying it’s not something that we could do, it’s just that if we go that route the language changes and it becomes a bit less intuitive.

I think what you propose might be possible, though it relies on most users adding full type annotations to their methods. For example, imagine I do this:

class Foo
  def foo
    10
  end
end

def something(x : Foo)
  x.foo
end

Easy: Foo#foo has the type Int32 so we can type this.

Now someone inherits Foo:

class Bar(T) < Foo
  def initialize(@x : T)
  end

  def foo
    @x.baz
  end
end

Now we don’t know what’s the type of foo for this subclass because it’s generic.

Maybe the problem is that T is not constrained to some type that defines a baz method for which we could know the type. We could maybe change that by introducing interfaces…

I think that, as soon as we try to type something prematurely we’ll run into the situation where we’d like all boundaries (methods, instance vars, args, return types) to be typed so we can be sure that we can type things. Relying on users putting type annotations everywhere is not a good idea.

In some cases some methods can’t be typed. For example Enumerable#sum. You can do [1, 2, 3].sum but you can’t do ['a', 'b'].sum. sum is defined for all Enumerable but it actually works for those where there’s a zero method on the class and a + method on each element. But all of this is implicit, it works with duck typing. If we want to type that we need to introduce protocols or interfaces or traits, and then the language becomes more complex.

So maybe some methods with some type restrictions could be fully typed, it’s still not clear to me exactly how to do it in a sound way.

2 Likes

Ary, could you please illustrate the difference in inference return/argument with one or two very simple methods? Does Crystal generate concrete overloads for all call sites of a method?

Sure!

So for this:

def double(x)
  x + x
end

if you call it like double(1) then the compiler will generate an instantiation for that:

def double<Int32>:Int32(x : Int32)
  x + x
end

If you call it with double("foo") it will do a similar thing:

def double<String>:String(x : String)
  x + x
end

The names I put there are the actual names the compiler end up generating, and you can see them if you do --emit llvm-ir and check the generated .ll code.

I’ll actually briefly talk about this on Thursday in the Chicago meetup, but not much more than that (for this topic, I’ll talk about many topics but not very deep in each of them).

3 Likes

Thank you :).

I see. So the compiler is like: this program needs a double that receives an Int32, and I am going to make that possible unless proved wrong.

You know what? Sounds like “convergence”, or even “crystallization” :grinning: to me. “Inference” suggests something more directed, bottom-up, perhaps wrongly, though, since I am not familiar with this topic.

So, when you get a binary, that’s one particular instantiation of everything that satisified all constraints. And the compilation of one library together with some other client code would generally result in a different set of method instances.

Is that the reason why the prelude is compiled everytime a program is compiled?

1 Like

Yes, indeed! That’s exactly how things work.

And you are right about calling it “type inference” being a bit misleading. We are not inferring anything, we are just typing things top-down. Well, there is quite a fair amount of logic involved in computing these types, specially with is_a?, if, and creating union types, but nothing is “inferred”, just computed… I guess.

That said, for a given program we could remember which were the instantiations, and later when you compile that same program we could try to reuse those instantiations. The tricky part is determining when we can reuse them.

3 Likes

Thanks very much, this is really helpful! I’m also looking forward to watching the talk at the Chicago meetup!

Oh great sage @asterite! the gentlest of key fondlers, please offer wisdom to this poor rails abuser.

If I were implementing incremental compilation I’d view it as a application caching problem. Why would that not work for a compiler?

Example:

# In foo.cr
def foo(arg)
  bar(arg)
end

# In bar.cr
def bar(arg)
  p typeof(arg)
end

# In controller.cr
foo 1

# In model.cr
foo "2"

When bar.cr is changed it recursively invalidates all dependants [foo.cr, controller.cr, model.cr].

Most of the time is spent working on application level changes. The shards are fixed and argument types don’t change often which should make the majority of argument calls cachable.

Where have I strayed in my thinking?

You didn’t stray in your thinking. I think we are you saying is possible, we just need to formally specify what a method’s dependencies are and when they can change.

For example, if I define a new file baz.cr that I now require, and do this:

def bar(x : Int32)
  "haha!"
end

Well, now foo.cr anb controller.cr need to be recompiled because they changed, even though bar.cr didn’t change.

Like that, there are many cases that make it hard.

macro some_macro
  {{ SomeType.subclasses.size }}
end

def bar
  some_macro
end

Now whenever someone subclasses SomeType we need to recompile bar.

We also have macro run which allows arbitrary execution of code at compile-time. Can we avoid running it over and over on each compilation? I don’t know.

Then the killer problem is generics. Imagine this:

module Moo
  abstract def moo
end

class Foo
  include Moo

  def moo
    puts "moo!"
  end
end

class MooHolder
  def initialize(@moo : Moo)
  end

  def moo
    @moo.moo
  end
end

MooHolder.new(Foo.new).moo

When do we need to recompile MooHolder#moo? It seems that if a new type includes Moo we need to recompile it. For example:

class Bar
  include Moo

  def moo
    puts "Miau!"
  end
end

MooHolder.new(Bar.new).moo

However, what is a generic type includes Moo?

class Gen(T)
  include Moo

  def initialize(@x : T)
  end

  def moo
    @x.moo
  end
end

The way modules work in Crystal, the compiler needs to know all of the including types, and all actual instances, to be able to do the multi-dispatch. But how do you know the actual instances? For example if I add somewhere deep in my code:

MooHolder.new(Gen.new(1))

then MooHolder#moo should be recomputed. And to do that we essentially need to type the entire program to find these occurrences.

So to be able to cache things we need to type the entire program. Hmm…?

We went over similar discussions with Brian and Juan in an office room in Manas like 3 or 4 years ago (probably more because time passes so fast). One idea was that we could require method return types for generics if the include a module that you want to include like that. But we weren’t sure that’s all that needed to change and how annoying would that be (what if someone includes a module in a type of your own?).

Anyway, if someone wants to go ahead and try to tackle this then go ahead, just beware that the language is actually pretty complex and “dynamic” at compile-time, so computing what are the dependencies of a method is not a trivial task.

2 Likes