BigInt is really really slow when run on linux, Or just Boehm GC issue?

asterite · September 22, 2022, 10:14am

It seems using GC.malloc_atomic produces a significant different. Well, it goes from 26 seconds to 23 seconds in my machine.

zw963 · September 22, 2022, 11:26am

Could you please send a patch or diff?

i can test it on my linux laptop.

naqvis · September 22, 2022, 11:50am

I guess it would be this change

diff --git a/src/big/lib_gmp.cr b/src/big/lib_gmp.cr
index 446f3ea5a..1791f00c4 100644
--- a/src/big/lib_gmp.cr
+++ b/src/big/lib_gmp.cr
@@ -240,7 +240,7 @@ lib LibGMP
 end

 LibGMP.set_memory_functions(
-  ->(size) { GC.malloc(size) },
+  ->(size) { GC.malloc_atomic(size) },
   ->(ptr, old_size, new_size) { GC.realloc(ptr, new_size) },
   ->(ptr, size) { GC.free(ptr) }
 )

jgaskins · September 22, 2022, 10:54pm

Fiber or thread? I was under the impression that GC happens outside the main thread. At least, I remember seeing GC threads outside the main thread when profiling.

Regardless of implementation, though, the GC doesn’t need you to relinquish the CPU to collect. For example, this app never uses more than 2.3MB of RAM despite generating tons of garbage and never yielding the CPU:

strings = [] of String

1_000_000_000.times do |i|
  string = i.to_s
  if rand < 0.1
    strings << string
    strings.shift if strings.size > 100
  end
end

asterite · September 22, 2022, 11:58pm

That’s not true. The garbage collector works like this: it reserves some space for memory. When you call malloc, if there’s free space, it gives you that. Otherwise, it runs the GC to see if there’s memory that can be free. If so, it frees it and uses that space for the requested memory. Otherwise it requests more memory.

HertzDevil · September 23, 2022, 12:17am

From their manual:

GMP may use allocated blocks to hold pointers to other allocated blocks. This will limit the assumptions a conservative garbage collection scheme can make.

So no, malloc_atomic is unsafe and will cause those internally pointed blocks to be collected.

zw963 · September 25, 2022, 4:45am

I test on following code use a clear linux env

require "big"

def fib(n)
  if n <= 1
    BigInt.new(1)
  else
    fib(n - 1) + fib(n - 2)
  end
end

time = Time.local
puts fib(BigInt.new(42))
puts Time.local - time

 ╰─ $ cr version
Crystal 1.6.0-dev [f2599414a] (2022-09-25)

LLVM: 14.0.6
Default target: x86_64-pc-linux-gnu

Following is the result before patch:

 ╰─ $ cr run --release 1.cr 
433494437
00:01:37.712517060

Following is the result after apply the following malloc_atomic patch

 ╰─ $ cr run --release 1.cr 
433494437
00:01:35.808699703

There is no big different.

straight-shoota · September 26, 2022, 8:16am

Oh, yeah what I wrote is completely wrong I was writing it on the go and apparently mixed something up. Sorry. I need to be more careful and not write such nonsense.

zw963 · September 26, 2022, 2:20pm

Hi, i don’t know, just ask, how to know this code use more than 2.3MB of RAM? Could you please paste the code?

jgaskins · September 26, 2022, 2:46pm

I was using macOS Activity Monitor. You could also use top or htop.

HertzDevil · September 26, 2022, 5:15pm

Disabling the GC actually ends up using more time because of the sheer amount of dynamic allocations involved.

GMP is meant to achieve peak performance when the values are mutable, which BigInt doesn’t support right now. A recursive implementation that stays true to GMP’s spirit might look like:

struct BigInt
  def add!(other : self) : self
    LibGMP.add(self, self, other)
    self
  end

  def add!(other : Int32) : self
    if other >= 0
      LibGMP.add_ui(self, self, other)
    else
      LibGMP.sub_ui(self, self, -other)
    end
    self
  end
end

ONE = 1.to_big_i

def fib2(ret, n) : Nil
  if n.value <= 1
    ret.value.add!(ONE)
  else
    n.value.add!(-1)
    fib2(ret, n)
    n.value.add!(-1)
    fib2(ret, n)
    n.value.add!(2)
  end
end

def fib2(n : BigInt) : BigInt
  ret = 0.to_big_i
  fib2(pointerof(ret), pointerof(n))
  ret
end

On my M2, fib2(42.to_big_i) takes 6.42 seconds whereas fib(42.to_big_i) takes 24.7, a speedup over 280%. The new version also allocates exactly 3 BigInts: the argument, the return value, and ONE. But it is hard to say if fib2 is idiomatic arithmetic code anymore, and the standard library’s BigInt API is specifically designed so that users don’t have to worry about writing (and more importantly, reading) this kind of code.

Also, for a fairer comparison try using a really large constant for fib(1)'s value on Ruby (something larger than 2 ** 63).

asterite · September 26, 2022, 5:35pm

See also: Add mutating methods to BigInt by Exilor · Pull Request #9825 · crystal-lang/crystal · GitHub

This seems to be a recurring Add mutating methods to BigInt by Exilor · Pull Request #9825 · crystal-lang/crystal · GitHub so I guess we eventually do something about it.

Topic		Replies	Views
The CrystalRuby gem through the Crystal lens Community	5	356	May 10, 2024
Build with --release performance is slow than the 2017 crystal version? Help & Support	16	643	June 13, 2022
BigInt performance	13	1207	October 16, 2020
Crystal 0.34.0 slower? Help & Support	53	1919	June 1, 2020
macOS BigSur: Crystal 1.0.0 is much slower than Ruby 3.0.0 Community	8	732	April 2, 2021

BigInt is really really slow when run on linux, Or just Boehm GC issue?

Related topics