I am reading following blog post write by @asterite
Following is content copy from there.
require "big"
def fib(n)
if n <= 1
BigInt.new(1)
else
fib(n - 1) + fib(n - 2)
end
end
time = Time.local
puts fib(BigInt.new(46))
puts Time.local - time
Then when run same code use Crystal 1.6-dev, it is much slower than the example above, in fact, i run several times, and break it early every time after 8 mins.
So, i guess there must somethings changed, which cause Crystal not as faster as before.
I consider we are really need a benchmark, like this, but, not compare to other language, just compare to language self, to find those type issue out.
Though, after do some Array/Tuple and puts trick, the performance is better, but, i think, we have to investigate why those code even unchanged, can gain that kind of performance in 2017, if we use those trick that time, still will beat Crystal 1.4 a lot?
BTW: above code run more than 18 mins … still no result. 8X slow than before.
The M1 is a beast. It’s super fast. The compiler compiles for me in like 12 seconds, while on other machines I heard it takes like 45 seconds.
In any case, the performance issue with BigInt is known.
One way to improve this is by representing BigInt internally as a union of a primitive value (say Int64) or a LibGMP::MPZ instance. Then if you do BigInt.new(1) it’s just an Int64 under the hood. If we add one to it, it’s still an Int64. But once it exceeds the limit we would just then call libgmp.
Although ruby’s bigint seems in general to perform better than the one of crystal.
If one changes the algorithm to something like
def fib(n)
return n if n<2
i, y, z= 2, BigInt.new(1), BigInt.new(1)
loop do
return z if i==n
i, y, z= i+1, z, (y+z)
end
end
and does things like fib(1_000_000) or even fib(10_000_000) then ruby’s benefit of being able to avoid BigInt for smaller values should be irrelevant as the vast majority of steps will just deal with BigInts anyway. But ruby is still significantly faster
Right, Ruby doesn’t use libgmp. They have their own algorithm for bigint. Maybe it’s more performant than libgmp. Also maybe it ties better with their GC, no idea.
I don’t know about the internals of LibGMP but maybe GC.malloc_atomic can be used instead of GC.malloc . That way the GC will not go into the payload of that allocation looking for possible pointers.
I think that’s to be expected. The garbage collection doesn’t even get any chance to run. The program is a tight CPU loop with no IO or any other opportunity where execution could possibly jump to the GC fiber.