BigInt is really really slow when run on linux, Or just Boehm GC issue?

I am reading following blog post write by @asterite

Following is content copy from there.


require "big"

def fib(n)
  if n <= 1
    BigInt.new(1)
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(BigInt.new(46))
puts Time.local - time

Let’s run it:

$ crystal fib.cr --release
2971215073
00:22:22.077676188

Then when run same code use Crystal 1.6-dev, it is much slower than the example above, in fact, i run several times, and break it early every time after 8 mins.

So, i guess there must somethings changed, which cause Crystal not as faster as before.

I consider we are really need a benchmark, like this, but, not compare to other language, just compare to language self, to find those type issue out.

1 Like

Have to mention, another post.

Though, after do some Array/Tuple and puts trick, the performance is better, but, i think, we have to investigate why those code even unchanged, can gain that kind of performance in 2017, if we use those trick that time, still will beat Crystal 1.4 a lot?

BTW: above code run more than 18 mins … still no result. 8X slow than before.

Done, same code, spent 2 mintes about 6 years ago.

But, now, it spend:

2971215073
00:22:22.077676188

11X slower?


Ruby with --jit:

 ╰─ $ 1  ruby --jit 1.cr 
2971215073
48.982288292

Ruby is 27X fast than Crystal.

Try crystal run --release fib.cr or crystal build --release fib.cr && ./fib
@zw963

For me it takes 26 seconds. What’s your machine? I have an M1. I think back then I had a Mac too. Macs are really fast!

With the code posted (had to replace Time.now with Time.utc), it takes 4min 31seconds for me (crystal 1.3.2 on an old MacPro from 2011 with Mojave).

--release seems to make no difference for me in this specific case

Yes, always.

@asterite

I use arch linux.

Sorry for that, both test result base on fib(46), not 42, as you can see, the result is ```
2971215073, i fixed the source code.

My fault, it seem like not so bad compare to Crystal old version, although, still slow than ruby 3.1.0 with --jit 27X

@asterite , @kjarex , i test on my laptop with fib(42) too, this time i copy running code directly, no mistakes.

require "big"

def fib(n)
  if n <= 1
    1.to_big_i
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(42.to_big_i)
puts Time.local - time
 ╰─ $ cr build 1.cr --release

 ╭─ 22:52  zw963 ⮀ ~/Crystal/play ⮀ ➦ ruby-3.1.0 
 ╰─ $ ./1 
433494437
00:03:25.531308608

So, @asterite , Is the performance difference a little big between mac and linux ?(26s vs 205s)

My laptop config:

lg gram 17 2020

I really wouldn’t know, but is this maybe really just due to BigInt? I assume that ruby doesn’t use bigint until it becomes necessary!?

Right, I think Ruby runs their own implementation of BigInt which is probably more efficient.

One idea I had in mind was not allocating a LibGMP BigInt if the value fits, say, and Int64.

Yes, the really issue come from BigInt, following code is quite fast even when i run fib(46) which exceed the maximum of Int32.

def fib(n) : Int64
  if n <= 1
    1_i64
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(46)
puts Time.local - time

 ╰─ $ cr build 1.cr --release

 ╭─ 23:08  zw963 ⮀ ~/Crystal/play ⮀ ➦ ruby-3.1.0 
 ╰─ $ ./1 
2971215073
00:00:09.300332441

I think the current issue is, why allocate BigInt object so quickly on mac than linux?

 ╰─ $ cr version
Crystal 1.6.0-dev [c502c77d8] (2022-09-16)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

The M1 is a beast. It’s super fast. The compiler compiles for me in like 12 seconds, while on other machines I heard it takes like 45 seconds.

In any case, the performance issue with BigInt is known.

One way to improve this is by representing BigInt internally as a union of a primitive value (say Int64) or a LibGMP::MPZ instance. Then if you do BigInt.new(1) it’s just an Int64 under the hood. If we add one to it, it’s still an Int64. But once it exceeds the limit we would just then call libgmp.

But it’s a lot of work to do! So PRs are welcome :slight_smile:

It’s exactly that.

Although ruby’s bigint seems in general to perform better than the one of crystal.

If one changes the algorithm to something like

def fib(n)
  return n if n<2
  i, y, z= 2, BigInt.new(1), BigInt.new(1)
  loop do
    return z if i==n
    i, y, z= i+1, z, (y+z)
  end
end

and does things like fib(1_000_000) or even fib(10_000_000) then ruby’s benefit of being able to avoid BigInt for smaller values should be irrelevant as the vast majority of steps will just deal with BigInts anyway. But ruby is still significantly faster :face_with_monocle:

require "big"

def fib(n)
  if n <= 1
    BigInt.new(1)
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
p fib(BigInt.new(42))
puts Time.local - time

I try perf the above code.

It seem like most of time spent on LibGMP.add LibGMP.add_ui and LibGC.malloc , ->(size) { GC.malloc(size) }, new object occupation is only 17.6%.

i don’t sure if those percent is related, maybe there exits others issue for linux LibGMP build? @asterite

I have a perf.data.xz, 281M, if you prefer to analysis, i can send you some some where.

Right, Ruby doesn’t use libgmp. They have their own algorithm for bigint. Maybe it’s more performant than libgmp. Also maybe it ties better with their GC, no idea.

Yes, i test it too, Ruby 2X faster than Crystal, it spend a lot of time in same object allocation too.

I don’t know about the internals of LibGMP but maybe GC.malloc_atomic can be used instead of GC.malloc . That way the GC will not go into the payload of that allocation looking for possible pointers.

1 Like

I tested this code on M1 Max both with and without GC. Both takes roughly same amount of time

➜  ./bigint
2971215073
00:00:07.872998000
➜  ./bigint_nogc
2971215073
00:00:07.872679000

I think that’s to be expected. The garbage collection doesn’t even get any chance to run. The program is a tight CPU loop with no IO or any other opportunity where execution could possibly jump to the GC fiber.