Significant performance increase was achieved by using a better primality test.
I had been using the Miller-Rabin tester, which works for any general integer type.
But I switched to using the Lucas Lehmer test used by GIMPS, et al.
Here is a direct pseudocode translation to Crystal of the algorithm.
Here p is the prime to test to create Mp = 2^p - 1 as a prime number.
def mp_prime?
p = self.to_big_i
s, mp = 4.to_big_i, (1.to_big_i << p) - 1
(p-2).times { s = ((s &* s) - 2) % mp }
s.zero?
end
I wrote a comparison test suite, Miller-Rabin (MR) versus the Lucas Lehmer (LL).
def mpcompare
p = (ARGV[0].to_u64 underscore: true)
steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
puts "Using Lucas Lehmer test"
puts tm { p.mp_prime? }; puts
mp = (1.to_big_i << p) - 1
puts "Using Miller- Rabin test"
puts tm { mp.prime? }; puts
end
Here are some sample results for the 26–31 Mp primes.
My Linux laptop: Lenovo Legion Slim 5, AMD Ryzen 7 8845HS, 5.1GHz, 8C|16T
➜ crystal-projects ./MR_vs_LL 23209
Using Lucas Lehmer test
00:00:01.206712410
Using Miller- Rabin test
00:00:04.954200650
➜ crystal-projects ./MR_vs_LL 44497
Using Lucas Lehmer test
00:00:06.065008826
Using Miller- Rabin test
00:00:22.960613112
➜ crystal-projects ./MR_vs_LL 86243
Using Lucas Lehmer test
00:00:31.244206519
Using Miller- Rabin test
00:02:02.685110966
➜ crystal-projects ./MR_vs_LL 110503
Using Lucas Lehmer test
00:00:59.973348587
Using Miller- Rabin test
00:03:25.674120484
➜ crystal-projects ./MR_vs_LL 132049
Using Lucas Lehmer test
00:01:30.333406242
Using Miller- Rabin test
00:05:23.010111264
➜ crystal-projects ./MR_vs_LL 216091
Using Lucas Lehmer test
00:03:25.890716233
Using Miller- Rabin test
00:16:46.958948555
This is now the total MP generator code.
require "big"
module Primes
module Utils
def mp_prime?
p = self.to_big_i
s, mp = 4.to_big_i, (1.to_big_i << p) - 1
(p-2).times { s = ((s &* s) - 2) % mp }
s.zero?
end
end
end
struct Int; include Primes::Utils end
def mpgen(p, steps = 1u64)
puts "Starting Mp gen exponent = #{p}"
puts "Starting Mp gen process at step #{steps}"
r = (1.to_big_i << p) - 1
modpnk = modpn = 3.to_big_i << p
(steps-1).times { modpnk = (modpnk << 2) + modpn }
loop do
mp = modpnk + r
print "\rsteps = #{steps}"
break if (p = mp.bit_length).mp_prime?
steps += 1
modpnk = (modpnk << 2) + modpn
end
print "\rsteps = #{steps}\n"
puts "Next Mp prime = #{p}"
p
end
def tm; t = Time.monotonic; yield; Time.monotonic - t end
def mpgenerate
p = (ARGV[0].to_u64 underscore: true)
steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
puts tm { mpgen(p, steps) }; puts
end
mpgenerate
The MR code determines non-primes faster, while LL determines primes faster.
The literature says there are bit optimizations to do the LL mod operation (xxx % mp). There’s also free software to implement LL with GPUs|Cuda|OpenCL.
So a little research has produced significant improvements, with lots still to come.