Crystal and Go

I’ll try to briefly explain the problem.

Let’s say you have class Foo:

class Foo
  def foo
    some_call
  end
end

Is that valid code? It is! Well, it compiles. When you actually call it it’s when you get an error:

Foo.new.foo # undefined local variable or method 'some_call' for Foo

However, if that file is being required by another file that also defines some_call to be a method that returns 1, everything compiles fine.

This is the first problem: a file isn’t self-contained, and it doesn’t either mention what (and where) its dependencies are. Heck, we can’t even know the type of foo until someone calls it (here we might because it has no arguments, but imagine it has an untyped argument).

So, if we want to do incremental compilation for the semantic phase we need to somehow know that foo depends on a method that, in one particular compilation, is located in one file. On a subsequent compilation we probably need to see if the method is defined in that other file or somewhere else, or if it was redefined, and whether its type changed (and how do we know that if we don’t know how it’s going to be called?).

That’s only the tip of the mountain, though.

Let’s say someone keeps an instance var of type Foo. If someone subclasses Foo like this:

class Bar < Foo
  def foo
    "hello"
  end
end

Now that @foo.foo method call must do a dispatch on Foo or Bar. Right now the compiler must know all the subclasses of a type to do the dispatch, it doesn’t uses virtual tables. A virtual table is basically: The method Foo#foo is located at a particular position in memory for an object, and always at the same location, and so we can just call that blindly at runtime and expect that to work. But here the method is overwritten and it returns a different type than the base type! So it’s a bit impossible to even think about a virtual table because we really need to know all the subclasses.

And then, what happens if I subclass Foo with a generic:

class Gen(T) < Foo
  def foo
    T.foo
  end
end

Now the way each foo works even changes based on what T is. Look at this:

class Foo
  def foo
    1
  end
end

class Gen(T) < Foo
  def foo
    T.foo
  end
end

struct Int32
  def self.foo
    'a'
  end
end

class String
  def self.foo
    "a"
  end
end

foos = [Foo.new, Gen(Int32).new, Gen(String).new] of Foo
p! typeof(foos[0]) # => Foo
p! typeof(foos[0].foo) # => (Char | Int32 | String)

Notice that the type of an array element is Foo, which, for the compiler means “Foo or any of its subtypes”.

Also note how the compiler knows that the type of the foo call is exactly that of Foo and all generic instantiations in the current program (try doing that in any other compiled language, there’s simply no way to do it unless T implements an interface and foo returns the same type all the time).

Now let’s try adding some more code at the bottom:

class Foo
  def foo
    1
  end
end

class Gen(T) < Foo
  def foo
    T.foo
  end
end

struct Int32
  def self.foo
    'a'
  end
end

class String
  def self.foo
    "a"
  end
end

foos = [Foo.new, Gen(Int32).new, Gen(String).new] of Foo
p! typeof(foos[0])
p! typeof(foos[0].foo)

# Here comes the new code

struct Bool
  def self.foo
    true
  end
end

def bar
  Gen(Bool)
end

bar

Note that the array of Foos is created and type-checked before the call to bar is type-checked, when the compiler finds out that, oh, Gen was actually instantiated with a new type, Bar, and Bar.foo returns Bool, so the actual type of foo must now also return Bool.

The compiler has to go back to the type it concluded for foo and say “Well, this can actually also be Bool”. And then process everything again from that point. Well, this is all actually done with bindings, so when the type of an ASTNode changes, all the dependent nodes are recomputed (calls are recomputed too!). Also see that we need to have hooks for when a new generic type is instantiated, to know which AST nodes we need to recompute.

The hard thing is that… how do we quickly know where are all the Gen instantiations? What if there’s a Gen(typeof(something))? We basically have to analyze the entire program, every time from scratch, to know this.

Or at least that’s what I think.

If someone has an idea of how to make all of this be more efficient, please let us know.

Also understand that the way the language works is our vision. That vision doesn’t exist in any other language out there. It actually feels a bit magical, how everything works so well (well, there are a lot of bugs regarding generics inheritance, and maybe now you understand why). But that vision doesn’t align with efficient or modular compilation.

If we want modular or efficient compilation we need to change and start restricting the language. For example, if you inherit from Foo and override foo, you must make sure to return the same type as the base type. But, wait, there’s no type specified as a return type in foo. So maybe we can make that required. But how do we know who’s going to override which method? So let’s make all return types required. Actually, let’s make all input and output types of methods required (this is what all statically typed languages do, even Haskell or other functional languages because they don’t have overloads). So, let’s keep changing the language in that direction, and we basically end up with Java or C#, or Go, but with Ruby syntax. Probably not a very interesting language, or an interesting addition to the language landscape.

My hope is that with time and effort some crazy (in the good sense) people will come and find ways to optimize the compiler, to find tricks to make it work faster. Ruby 1.8 was very slow. Ruby 1.9 got the VM revamped, and now they are doing JIT stuff. So there’s always a chance to improve things. But right now I feel that this task is incredibly hard to do.

Edit: that’s funny, now I see I started this post with “briefly explain the problem”.

17 Likes