The true cost of parsing Crystal source code

Passing --stats to Crystal when building or running any source code gives you a list of compilation stages and their times elapsed, the first stage being Parse. Most of the time, this stage takes less than 1 millisecond, even when rebuilding a large codebase like the compiler itself. This is naturally impossible, no matter how fast your computer runs.

The reason here is that the scope of the stage is very small. It measures only the time taken to parse the entry points given on the command line, not the time taken for required files; they are parsed as Require nodes are visited by the top-level visitor, meaning all that elapsed time counts towards the Semantic (top level) stage instead. Knowing that all the parsing goes through Crystal::Parser#parse, here is a rudimentary patch that prints the total time taken for all invocations of that method:

module Crystal
  class Parser
    class_property elapsed = Time::Span.zero

    def parse
      t0 = Time.monotonic
      begin
        previous_def
      ensure
        t1 = Time.monotonic
        Parser.elapsed += t1 - t0
      end
    end
  end

  class Compiler
    private def print_codegen_stats(units)
      return unless @progress_tracker.stats?
      puts "Parse total: #{Parser.elapsed}"
      previous_def
    end
  end
end

We then build an empty file with --stats, meaning the prelude is solely responsible for all the parsing. On Windows, a typical run looks like this:

Parse:                             00:00:00.000625400 (   1.16MB)
Semantic (top level):              00:00:00.371690400 ( 129.64MB)
Semantic (new):                    00:00:00.001411500 ( 129.64MB)
Semantic (type declarations):      00:00:00.022352500 ( 129.64MB)
Semantic (abstract def check):     00:00:00.008010900 ( 145.64MB)
Semantic (restrictions augmenter): 00:00:00.006648800 ( 145.64MB)
Semantic (ivars initializers):     00:00:00.095378200 ( 145.70MB)
Semantic (cvars initializers):     00:00:00.003065200 ( 145.70MB)
Semantic (main):                   00:00:00.212487200 ( 193.82MB)
Semantic (cleanup):                00:00:00.000268600 ( 193.82MB)
Semantic (recursive struct check): 00:00:00.000578700 ( 193.82MB)
Codegen (crystal):                 00:00:00.199303900 ( 193.82MB)
Codegen (bc+obj):                  00:00:00.182948200 ( 209.82MB)
Codegen (linking):                 00:00:00.714077300 ( 209.82MB)
Parse total: 00:00:00.268039300

0.27 second, or over 70% of the top-level phase, is spent on parsing source code. It may seem little compared to the main and codegen phases, but definitely contributes to the slowless perceived by some newcomers running their first Hello World program and other short scripts. It also dominates compiler components that only require the top-level semantic pass, like the docs generator and the hierarchy tool. Below is a flame graph showing the same symptom: (this is done on Debian WSL)

The graph is focused on the outermost stack frame involving the top-level visitor, and all the highlighted frames are instance methods of Crystal::Parser. They indeed occupy the majority of the top-level phase.

Traditionally, much of the work on improving compilation times has been directed at the codegen phases, and there was little attention to the parser. There are probably many missed optimization opportunities here, and we may want better reporting facilities for --stats too, not least because the stages of interest are not perfectly sequential, i.e. parsing could happen in the middle of the main phase too for macro methods referring to the @type macro variable.

19 Likes

Yeah, I guess it’s time to end the tale of the compiler’s incredible parse efficiency :laughing:
Thanks for digging into this.

Still, parsing consumes only a small portion of the overall compile time (I guess ~10-15%).

Another consideration is growth behaviour: Parsing scales strictly proportional to the size of the codebase. But the complexity of semantics (and, to a lesser extent, codegen) grows exponentially.
As a result, improvements in parsing have less effect on the overall compilation time for large codebases than smaller ones.

This might put a bit of a perspective. But doesn’t mean we should not try to optimize the parser. Any improvements are helpful. Especially if there are any low-hanging fruit.

Ah, but each crystal spec run is a compilation, and if you structure your specs in proper bite size, the cumulative effect of better small code base compile times will have a significant impact on DX experience if you run just the spec(s) for the part you’re working on rather than the whole suite.

So might be worth expending some effort?

In the compile example given by Hertzdevil, parsing (0.27s) took 9.5% of the compilation time. (2.81s total).

In any real application, no matter how small your specs are, total time is going to be longer than this, but the parsing won’t. Meaning it’s always going to take <9.5% of the total build time.
Assuming that the time it takes to parse the source could be reduced to 0 seconds, you’d be saving 0.27 seconds, which is hardly something you’d notice in anything other than a simple hello world or scriptlet :confused:

@Barney The total build time is 1.8s, then half of it is the LLVM codegen + link time (0.9s) where we can’t do much, so 0.27s is in fact 29% of the total time in crystal.

1 Like

2 maths degrees and I still can’t use a calculator :see_no_evil_monkey: Thank you for correcting me, that sounds like a more substantial difference.

what was used to generate that graph?

Debian linux-perf and:

$ make clean crystal release=1 FLAGS=--frame-pointers=always
$ perf record -F 1000 -g -- bin/crystal build --no-codegen src/compiler/crystal.cr
$ perf script report flamegraph -o flamegraph.html --allow-download
3 Likes