# Solve difficult sudoku as a performance test for different programming languages

The work on this started in 2013 and I´ve done it in a lot of programming languages since then.

The concept is a recursive algorithm that need as little memory as possible.
It manipulates the field directly. The rules are made of arrays, so this solution is quite minimal.

See

The tests ard made on macOs on macbook pro 15 from early 2013.

lang time exe-size
c (clang) 7 ms 9 KB
rust 6 ms 320 KB
cystal 32 ms 420 KB
js Chorme 24 ms
js Firefox 21 ms
java 24 ms
lua 1,7 s
python3.7 2.7 s
``````module Sudoku
VERSION           = "0.1.0"
RULE_ROWS         = (0..8).cycle.first(9*9).to_a
RULE_COLUMS       = (0..80).map { |x| x // 9 }.to_a
RULE_BASIC_BLOCKS =
(0..80).map { |x| (x % 9 // 3) + ((x//27) * 3) % 9 }.to_a
RULES = [RULE_ROWS, RULE_COLUMS, RULE_BASIC_BLOCKS]

def self.print_board(board : Array(Int32))
puts ""
board.each_with_index do |x, i|
print " #{x.zero? ? "_" : x}"
print " " if i %  3 ==  2
puts ""   if i %  9 ==  8
puts ""   if i % 27 == 26
end
end

private def self.set_field?(index : Int32, board : Array(Int32), set_mask : Array(Int32))
field_no = index
loop do
return true if field_no >= 81
break if board[field_no].zero?
field_no += 1
end
(1..9).each do |i|
if ((1 << i) & set_mask[field_no]).zero?
board[field_no] = i
if RULES.all? { |rule| valid?(board, rule, new_set_mask) }
if set_field?(field_no + 1, board, new_set_mask)
return true
end
end
end
end
board[field_no] = 0
false
end

flags = Array.new(9, 0)
board.zip(rule).each do |(f, r)|
next if f.zero?
return false if !(flags[r] & mask).zero?
end
rule.each_with_index do |r, i|
set_mask[i] |= flags[r] # mark numbers that are taken for this rule
end
true
end

def self.solve?(board : Array(Int32))
end

def self.main
field = [
1, 0, 0,  0, 0, 7,  0, 9, 0,
0, 3, 0,  0, 2, 0,  0, 0, 8,
0, 0, 9,  6, 0, 0,  5, 0, 0,

0, 0, 5,  3, 0, 0,  9, 0, 0,
0, 1, 0,  0, 8, 0,  0, 0, 2,
6, 0, 0,  0, 0, 4,  0, 0, 0,

3, 0, 0,  0, 0, 0,  0, 1, 0,
0, 4, 0,  0, 0, 0,  0, 0, 7,
0, 0, 7,  0, 0, 0,  3, 0, 0
]

print_board(field)
start_time = Time.local
result = solve?(field)
stop_time = Time.local
puts "Millisecounds: #{(stop_time - start_time).nanoseconds() / 1000_000.0}"
if result
print_board(field)
else
puts "no solition found"
end
end

main
end
``````
1 Like

inb4 the code is crystal’d by lead devs and roffle stomps that chart

Java and JavaScript have some very sophisticated virtual machines which can lead to pretty good performance with JIT compilation and other optimizations. It obviously depends on the specifics of the use case. And generally, Crystal should usually outperform them.

First of all, did you compile with `--release` flag? Otherwise the code won’t be optimized.

In `.valid?` there is an unnecessary heap allocation with `board.zip(rule).each do |(f, r)|`. This is probably just a minor issue, but this seems to be frequently executed. You should use `board.zip(rule) do |f, r|` instead to avoid instantiating an iterator.

1 Like

Also, for measuring elapsed time, please use `Time.monotonic` (or the convenient `Time.measure` helper). This won’t improve performance but make sure that the measurements are taken correctly.

Thank you.

Of course I run it with --release.
Remove the `.each` as you describe reduces the execution time to 15 ms!!!. but rust is twice as fast in this case.

crystal is now faster than js and java and this fits my expectations. Are there Arrays with fixed size on the stack?

If someone has other tips the improvements, please.

I don’t understand why, but with these changes the code runs in 0.14ms:

``````# TODO: Write documentation for `Sudoku`
module Sudoku
VERSION   = "0.1.0"
RULE_ROWS = StaticArray[
0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 7, 7,
8, 8, 8, 8, 8, 8, 8, 8, 8,
]
RULE_COLUMS       = (0..80).map { |x| x // 9 }.to_a
RULE_BASIC_BLOCKS =
(0..80).map { |x| (x % 9 // 3) + ((x//27) * 3) % 9 }.to_a
RULES = [RULE_ROWS, RULE_COLUMS, RULE_BASIC_BLOCKS]

def self.print_board(board : Array(Int32))
puts ""
board.each_with_index do |x, i|
print " #{x.zero? ? "_" : x}"
print " " if i % 3 == 2
puts "" if i % 9 == 8
puts "" if i % 27 == 26
end
end

private def self.set_field?(index : Int32, board : Array(Int32), set_mask : Array(Int32))
field_no = index
loop do
return true if field_no >= 81
break if board[field_no].zero?
field_no += 1
end
(1..9).each do |i|
if ((1 << i) & set_mask[field_no]).zero?
board[field_no] = i
if RULES.all? { |rule| valid?(board, rule, new_set_mask) }
if set_field?(field_no + 1, board, new_set_mask)
return true
end
end
end
end
board[field_no] = 0
false
end

flags = StaticArray(Int32, 9).new(0)
board.zip(rule) do |f, r|
next if f.zero?
return false if !(flags[r] & mask).zero?
end
rule.each_with_index do |r, i|
set_mask[i] |= flags[r] # mark numbers that are taken for this rule
end
true
end

def self.solve?(board : Array(Int32))
end

def self.main
field = [
1, 0, 0, 0, 0, 7, 0, 9, 0,
0, 3, 0, 0, 2, 0, 0, 0, 8,
0, 0, 9, 6, 0, 0, 5, 0, 0,

0, 0, 5, 3, 0, 0, 9, 0, 0,
0, 1, 0, 0, 8, 0, 0, 0, 2,
6, 0, 0, 0, 0, 4, 0, 0, 0,

3, 0, 0, 0, 0, 0, 0, 1, 0,
0, 4, 0, 0, 0, 0, 0, 0, 7,
0, 0, 7, 0, 0, 0, 3, 0, 0,
]

print_board(field)
start_time = Time.monotonic
result = solve?(field)
stop_time = Time.monotonic
puts "Millisecounds: #{(stop_time - start_time).nanoseconds / 1000_000.0}"
if result
print_board(field)
else
puts "no solition found"
end
end

main
end
``````

Moving more things to static array makes it worse (like 12~15ms).

My advice is, if you are going to benchmark two languages, it’s better to write the exact same code. In Crystal you initialize the arrays doing some array manipulation, mapping, etc., while in Java you use static arrays with constants (no need to compute anything), plus I’m almost sure Java will optimize that away to static (immutable) arrays.

1 Like

You are also using `byte` in Java and `Int32` in Crystal.

The benchmarks aren’t comparing the same code at all. But, I think once you change `zip` to avoid creating an intermediate array the difference is 9ms Java vs 15ms Crystal and I think that’s pretty acceptable, given that the Crystal version is objectively worse (coded with less performance in mind) than Java.

1 Like

Actually, the code I provided gives a different (I think wrong) result. In any case, what I said about different codes being benchmarked still holds.

There was a talk recently, I can’t remember where, in StrangeLoop or similar, about how little changes in code layout can change drastically how code behaves. So I wouldn’t rely on these micro-benchmarks for anything.

Thank you !! I really enjoy programming in crystal.

I don´t like the java code, this might be better, with iterators. And this is done years ago from a friend.

I was originally compared to rust or other compiled languages like c. The crystal implementation is really similar to the rust.

I switched to UInt8 and the UInt16 for the StaticArray() and now I got down to 9.5 ms.
So everything is fine. crystal has a GC, and a heap objects so I expect rust to be a little bit faster.

Oh if you got no solution you might have done something wrong, and the time is quite short.

1 Like

When you do intend to benchmark with Go programming too?

You can profile it: https://github.com/crystal-lang/crystal-book/blob/master/guides/performance.md and try and see where the bottlenecks are…

# “Performance Matters” by Emery Berger

I think it is this one? Amazing talk on performance, it just open my eyes how little I knew about computing.

2 Likes

Indeed, that talk is crazy! After watching it I don’t trust microbenchmarks anymore.

2 Likes

Thank you for sharing this mind-blowing talk!
Everyone who is interested in performance go watch this, it’s worth every second of your time.

Can we make coz work with Crystal?

It looks like there is only a C and Java versions of coz profiler currently https://github.com/plasma-umass/coz

If Crystal can emit C code via LLVM and mark corresponding sections in C code with `COZ_PROGRESS_NAMED`, `COZ_BEGIN`/`COZ_END` macros https://github.com/plasma-umass/coz then probably it can be usable.

The `coz.h` which is actually part of the executable is definitely able to be rewritten in crystal using the C bindings.

There’s no need to compile to C at all.

Wrote it :)

Unfortunately it’s very limited by the incredibly inaccurate debug info which crystal emits.

4 Likes

@RX14 I am still working on it ;)
I am trying to fix line locations first after that will implement union debug types. nullable types are already working in my branch.

2 Likes

This website is interesting for performance testing of languages
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/nbody-cpu.html
It could be interesting to implement these programs in crystal too and get listed.

But some of them are really heavy optimized and uses, for instance, SIMD Cpu instructions directly.
There are also Ruby solutions.

1 Like