BigInt is really really slow when run on linux, Or just Boehm GC issue?

zw963 · September 21, 2022, 1:11pm

I am reading following blog post write by @asterite

Following is content copy from there.

require "big"

def fib(n)
  if n <= 1
    BigInt.new(1)
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(BigInt.new(46))
puts Time.local - time

Let’s run it:

$ crystal fib.cr --release
2971215073
00:22:22.077676188

Then when run same code use Crystal 1.6-dev, it is much slower than the example above, in fact, i run several times, and break it early every time after 8 mins.

So, i guess there must somethings changed, which cause Crystal not as faster as before.

I consider we are really need a benchmark, like this, but, not compare to other language, just compare to language self, to find those type issue out.

zw963 · September 21, 2022, 1:19pm

Have to mention, another post.

Though, after do some Array/Tuple and puts trick, the performance is better, but, i think, we have to investigate why those code even unchanged, can gain that kind of performance in 2017, if we use those trick that time, still will beat Crystal 1.4 a lot?

BTW: above code run more than 18 mins … still no result. 8X slow than before.

zw963 · September 21, 2022, 1:25pm

Done, same code, spent 2 mintes about 6 years ago.

But, now, it spend:

2971215073
00:22:22.077676188

11X slower?

Ruby with --jit:

 ╰─ $ 1  ruby --jit 1.cr 
2971215073
48.982288292

Ruby is 27X fast than Crystal.

cyangle · September 21, 2022, 2:36pm

Try crystal run --release fib.cr or crystal build --release fib.cr && ./fib
@zw963

asterite · September 21, 2022, 2:37pm

For me it takes 26 seconds. What’s your machine? I have an M1. I think back then I had a Mac too. Macs are really fast!

anon69898395 · September 21, 2022, 2:47pm

With the code posted (had to replace Time.now with Time.utc), it takes 4min 31seconds for me (crystal 1.3.2 on an old MacPro from 2011 with Mojave).

--release seems to make no difference for me in this specific case

zw963 · September 21, 2022, 2:47pm

Yes, always.

@asterite

I use arch linux.

Sorry for that, both test result base on fib(46), not 42, as you can see, the result is ```
2971215073, i fixed the source code.

My fault, it seem like not so bad compare to Crystal old version, although, still slow than ruby 3.1.0 with --jit 27X

zw963 · September 21, 2022, 3:00pm

@asterite , @anon69898395 , i test on my laptop with fib(42) too, this time i copy running code directly, no mistakes.

require "big"

def fib(n)
  if n <= 1
    1.to_big_i
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(42.to_big_i)
puts Time.local - time

 ╰─ $ cr build 1.cr --release

 ╭─ 22:52  zw963 ⮀ ~/Crystal/play ⮀ ➦ ruby-3.1.0 
 ╰─ $ ./1 
433494437
00:03:25.531308608

So, @asterite , Is the performance difference a little big between mac and linux ?(26s vs 205s)

My laptop config:

lg gram 17 2020

anon69898395 · September 21, 2022, 3:03pm

I really wouldn’t know, but is this maybe really just due to BigInt? I assume that ruby doesn’t use bigint until it becomes necessary!?

asterite · September 21, 2022, 3:04pm

Right, I think Ruby runs their own implementation of BigInt which is probably more efficient.

One idea I had in mind was not allocating a LibGMP BigInt if the value fits, say, and Int64.

zw963 · September 21, 2022, 3:11pm

Yes, the really issue come from BigInt, following code is quite fast even when i run fib(46) which exceed the maximum of Int32.

def fib(n) : Int64
  if n <= 1
    1_i64
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(46)
puts Time.local - time

 ╰─ $ cr build 1.cr --release

 ╭─ 23:08  zw963 ⮀ ~/Crystal/play ⮀ ➦ ruby-3.1.0 
 ╰─ $ ./1 
2971215073
00:00:09.300332441

I think the current issue is, why allocate BigInt object so quickly on mac than linux?

 ╰─ $ cr version
Crystal 1.6.0-dev [c502c77d8] (2022-09-16)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

asterite · September 21, 2022, 3:16pm

The M1 is a beast. It’s super fast. The compiler compiles for me in like 12 seconds, while on other machines I heard it takes like 45 seconds.

In any case, the performance issue with BigInt is known.

One way to improve this is by representing BigInt internally as a union of a primitive value (say Int64) or a LibGMP::MPZ instance. Then if you do BigInt.new(1) it’s just an Int64 under the hood. If we add one to it, it’s still an Int64. But once it exceeds the limit we would just then call libgmp.

But it’s a lot of work to do! So PRs are welcome

asterite · September 21, 2022, 3:16pm

It’s exactly that.

anon69898395 · September 21, 2022, 4:24pm

Although ruby’s bigint seems in general to perform better than the one of crystal.

If one changes the algorithm to something like

def fib(n)
  return n if n<2
  i, y, z= 2, BigInt.new(1), BigInt.new(1)
  loop do
    return z if i==n
    i, y, z= i+1, z, (y+z)
  end
end

and does things like fib(1_000_000) or even fib(10_000_000) then ruby’s benefit of being able to avoid BigInt for smaller values should be irrelevant as the vast majority of steps will just deal with BigInts anyway. But ruby is still significantly faster

zw963 · September 21, 2022, 4:35pm

require "big"

def fib(n)
  if n <= 1
    BigInt.new(1)
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
p fib(BigInt.new(42))
puts Time.local - time

I try perf the above code.

It seem like most of time spent on LibGMP.add LibGMP.add_ui and LibGC.malloc , ->(size) { GC.malloc(size) }, new object occupation is only 17.6%.

i don’t sure if those percent is related, maybe there exits others issue for linux LibGMP build? @asterite

I have a perf.data.xz, 281M, if you prefer to analysis, i can send you some some where.

asterite · September 21, 2022, 4:44pm

Right, Ruby doesn’t use libgmp. They have their own algorithm for bigint. Maybe it’s more performant than libgmp. Also maybe it ties better with their GC, no idea.

zw963 · September 21, 2022, 4:55pm

Yes, i test it too, Ruby 2X faster than Crystal, it spend a lot of time in same object allocation too.

bcardiff · September 21, 2022, 7:06pm

I don’t know about the internals of LibGMP but maybe GC.malloc_atomic can be used instead of GC.malloc . That way the GC will not go into the payload of that allocation looking for possible pointers.

naqvis · September 22, 2022, 2:51am

I tested this code on M1 Max both with and without GC. Both takes roughly same amount of time

➜  ./bigint
2971215073
00:00:07.872998000
➜  ./bigint_nogc
2971215073
00:00:07.872679000

straight-shoota · September 22, 2022, 7:28am

I think that’s to be expected. The garbage collection doesn’t even get any chance to run. The program is a tight CPU loop with no IO or any other opportunity where execution could possibly jump to the GC fiber.

Topic		Replies	Views
BigInt performance	13	1226	October 16, 2020
Ruby out performs Crystal significantly on this numerical algorithm Help & Support	18	1337	April 29, 2023
Crystal 0.34.0 slower? Help & Support	53	1924	June 1, 2020
1.0 multi-threading memory use issues Crystal Contrib	32	1696	August 29, 2020
On the Infinity of Twin Primes and other K-tuples	22	1110	November 27, 2019

BigInt is really really slow when run on linux, Or just Boehm GC issue?

Related topics