Js and java are faster in my test, how can that be?

Solve difficult sudoku as a performance test for different programming languages

The work on this started in 2013 and I´ve done it in a lot of programming languages since then.

The concept is a recursive algorithm that need as little memory as possible.
It manipulates the field directly. The rules are made of arrays, so this solution is quite minimal.

See

The tests ard made on macOs on macbook pro 15 from early 2013.

lang time exe-size
c (clang) 7 ms 9 KB
rust 6 ms 320 KB
cystal 32 ms 420 KB
js Chorme 24 ms
js Firefox 21 ms
java 24 ms
lua 1,7 s
python3.7 2.7 s
module Sudoku
  VERSION           = "0.1.0"
  RULE_ROWS         = (0..8).cycle.first(9*9).to_a
  RULE_COLUMS       = (0..80).map { |x| x // 9 }.to_a
  RULE_BASIC_BLOCKS =
    (0..80).map { |x| (x % 9 // 3) + ((x//27) * 3) % 9 }.to_a
  RULES = [RULE_ROWS, RULE_COLUMS, RULE_BASIC_BLOCKS]

  def self.print_board(board : Array(Int32))
    puts ""
    board.each_with_index do |x, i|
      print " #{x.zero? ? "_" : x}"
      print " " if i %  3 ==  2
      puts ""   if i %  9 ==  8
      puts ""   if i % 27 == 26
    end
  end

  private def self.set_field?(index : Int32, board : Array(Int32), set_mask : Array(Int32))
    field_no = index
    loop do
      return true if field_no >= 81
      break if board[field_no].zero?
      field_no += 1
    end
    (1..9).each do |i|
      if ((1 << i) & set_mask[field_no]).zero?
        board[field_no] = i
        new_set_mask = set_mask.clone
        if RULES.all? { |rule| valid?(board, rule, new_set_mask) }
          if set_field?(field_no + 1, board, new_set_mask)
            return true
          end
        end
      end
    end
    board[field_no] = 0
    false
  end

  private def self.valid?(board, rule, set_mask)
    flags = Array.new(9, 0)
    board.zip(rule).each do |(f, r)|
      next if f.zero?
      mask = 1 << f
      return false if !(flags[r] & mask).zero?
      flags[r] |= mask
    end
    rule.each_with_index do |r, i|
      set_mask[i] |= flags[r] # mark numbers that are taken for this rule
    end
    true
  end

  def self.solve?(board : Array(Int32))
    set_mask = Array.new(81, 0)
    set_field?(0, board, set_mask)
   end

  def self.main
    field = [
      1, 0, 0,  0, 0, 7,  0, 9, 0,
      0, 3, 0,  0, 2, 0,  0, 0, 8,
      0, 0, 9,  6, 0, 0,  5, 0, 0,

      0, 0, 5,  3, 0, 0,  9, 0, 0,
      0, 1, 0,  0, 8, 0,  0, 0, 2,
      6, 0, 0,  0, 0, 4,  0, 0, 0,

      3, 0, 0,  0, 0, 0,  0, 1, 0, 
      0, 4, 0,  0, 0, 0,  0, 0, 7,
      0, 0, 7,  0, 0, 0,  3, 0, 0
    ]

    print_board(field)
    start_time = Time.local
    result = solve?(field)
    stop_time = Time.local
    puts "Millisecounds: #{(stop_time - start_time).nanoseconds() / 1000_000.0}"
    if result
      print_board(field)
    else
      puts "no solition found"
    end
  end

  main
end
1 Like

inb4 the code is crystal’d by lead devs and roffle stomps that chart :smiley:

Java and JavaScript have some very sophisticated virtual machines which can lead to pretty good performance with JIT compilation and other optimizations. It obviously depends on the specifics of the use case. And generally, Crystal should usually outperform them.

First of all, did you compile with --release flag? Otherwise the code won’t be optimized.

In .valid? there is an unnecessary heap allocation with board.zip(rule).each do |(f, r)|. This is probably just a minor issue, but this seems to be frequently executed. You should use board.zip(rule) do |f, r| instead to avoid instantiating an iterator.

1 Like

Also, for measuring elapsed time, please use Time.monotonic (or the convenient Time.measure helper). This won’t improve performance but make sure that the measurements are taken correctly.

Thank you.

Of course I run it with --release.
Remove the .each as you describe reduces the execution time to 15 ms!!!. but rust is twice as fast in this case.

crystal is now faster than js and java and this fits my expectations. Are there Arrays with fixed size on the stack?

If someone has other tips the improvements, please.

I don’t understand why, but with these changes the code runs in 0.14ms:

# TODO: Write documentation for `Sudoku`
module Sudoku
  VERSION   = "0.1.0"
  RULE_ROWS = StaticArray[
    0, 0, 0, 0, 0, 0, 0, 0, 0,
    1, 1, 1, 1, 1, 1, 1, 1, 1,
    2, 2, 2, 2, 2, 2, 2, 2, 2,
    3, 3, 3, 3, 3, 3, 3, 3, 3,
    4, 4, 4, 4, 4, 4, 4, 4, 4,
    5, 5, 5, 5, 5, 5, 5, 5, 5,
    6, 6, 6, 6, 6, 6, 6, 6, 6,
    7, 7, 7, 7, 7, 7, 7, 7, 7,
    8, 8, 8, 8, 8, 8, 8, 8, 8,
  ]
  RULE_COLUMS       = (0..80).map { |x| x // 9 }.to_a
  RULE_BASIC_BLOCKS =
    (0..80).map { |x| (x % 9 // 3) + ((x//27) * 3) % 9 }.to_a
  RULES = [RULE_ROWS, RULE_COLUMS, RULE_BASIC_BLOCKS]

  def self.print_board(board : Array(Int32))
    puts ""
    board.each_with_index do |x, i|
      print " #{x.zero? ? "_" : x}"
      print " " if i % 3 == 2
      puts "" if i % 9 == 8
      puts "" if i % 27 == 26
    end
  end

  private def self.set_field?(index : Int32, board : Array(Int32), set_mask : Array(Int32))
    field_no = index
    loop do
      return true if field_no >= 81
      break if board[field_no].zero?
      field_no += 1
    end
    (1..9).each do |i|
      if ((1 << i) & set_mask[field_no]).zero?
        board[field_no] = i
        new_set_mask = set_mask.dup
        if RULES.all? { |rule| valid?(board, rule, new_set_mask) }
          if set_field?(field_no + 1, board, new_set_mask)
            return true
          end
        end
      end
    end
    board[field_no] = 0
    false
  end

  private def self.valid?(board, rule, set_mask)
    flags = StaticArray(Int32, 9).new(0)
    board.zip(rule) do |f, r|
      next if f.zero?
      mask = 1 << f
      return false if !(flags[r] & mask).zero?
      flags[r] |= mask
    end
    rule.each_with_index do |r, i|
      set_mask[i] |= flags[r] # mark numbers that are taken for this rule
    end
    true
  end

  def self.solve?(board : Array(Int32))
    set_mask = Array.new(81, 0)
    set_field?(0, board, set_mask)
  end

  def self.main
    field = [
      1, 0, 0, 0, 0, 7, 0, 9, 0,
      0, 3, 0, 0, 2, 0, 0, 0, 8,
      0, 0, 9, 6, 0, 0, 5, 0, 0,

      0, 0, 5, 3, 0, 0, 9, 0, 0,
      0, 1, 0, 0, 8, 0, 0, 0, 2,
      6, 0, 0, 0, 0, 4, 0, 0, 0,

      3, 0, 0, 0, 0, 0, 0, 1, 0,
      0, 4, 0, 0, 0, 0, 0, 0, 7,
      0, 0, 7, 0, 0, 0, 3, 0, 0,
    ]

    print_board(field)
    start_time = Time.monotonic
    result = solve?(field)
    stop_time = Time.monotonic
    puts "Millisecounds: #{(stop_time - start_time).nanoseconds / 1000_000.0}"
    if result
      print_board(field)
    else
      puts "no solition found"
    end
  end

  main
end

Moving more things to static array makes it worse (like 12~15ms).

My advice is, if you are going to benchmark two languages, it’s better to write the exact same code. In Crystal you initialize the arrays doing some array manipulation, mapping, etc., while in Java you use static arrays with constants (no need to compute anything), plus I’m almost sure Java will optimize that away to static (immutable) arrays.

1 Like

You are also using byte in Java and Int32 in Crystal.

The benchmarks aren’t comparing the same code at all. But, I think once you change zip to avoid creating an intermediate array the difference is 9ms Java vs 15ms Crystal and I think that’s pretty acceptable, given that the Crystal version is objectively worse (coded with less performance in mind) than Java.

1 Like

Actually, the code I provided gives a different (I think wrong) result. In any case, what I said about different codes being benchmarked still holds.

There was a talk recently, I can’t remember where, in StrangeLoop or similar, about how little changes in code layout can change drastically how code behaves. So I wouldn’t rely on these micro-benchmarks for anything.

Thank you !! I really enjoy programming in crystal.

I don´t like the java code, this might be better, with iterators. And this is done years ago from a friend.

I was originally compared to rust or other compiled languages like c. The crystal implementation is really similar to the rust.

I switched to UInt8 and the UInt16 for the StaticArray() and now I got down to 9.5 ms.
So everything is fine. crystal has a GC, and a heap objects so I expect rust to be a little bit faster.

Oh if you got no solution you might have done something wrong, and the time is quite short.

1 Like

When you do intend to benchmark with Go programming too?

You can profile it: https://github.com/crystal-lang/crystal-book/blob/master/guides/performance.md and try and see where the bottlenecks are… :slight_smile:

“Performance Matters” by Emery Berger

I think it is this one? Amazing talk on performance, it just open my eyes how little I knew about computing.

2 Likes

Indeed, that talk is crazy! After watching it I don’t trust microbenchmarks anymore.

2 Likes

Thank you for sharing this mind-blowing talk!
Everyone who is interested in performance go watch this, it’s worth every second of your time.

Can we make coz work with Crystal?

It looks like there is only a C and Java versions of coz profiler currently https://github.com/plasma-umass/coz

If Crystal can emit C code via LLVM and mark corresponding sections in C code with COZ_PROGRESS_NAMED, COZ_BEGIN/COZ_END macros https://github.com/plasma-umass/coz then probably it can be usable.

The coz.h which is actually part of the executable is definitely able to be rewritten in crystal using the C bindings.

There’s no need to compile to C at all.

Wrote it :)

Unfortunately it’s very limited by the incredibly inaccurate debug info which crystal emits.

4 Likes

@RX14 I am still working on it ;)
I am trying to fix line locations first after that will implement union debug types. nullable types are already working in my branch.

2 Likes

This website is interesting for performance testing of languages
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/nbody-cpu.html
It could be interesting to implement these programs in crystal too and get listed.

But some of them are really heavy optimized and uses, for instance, SIMD Cpu instructions directly.
There are also Ruby solutions.

1 Like