Optimized Mersenne Prime Search

Significant performance increase was achieved by using a better primality test.
I had been using the Miller-Rabin tester, which works for any general integer type.
But I switched to using the Lucas Lehmer test used by GIMPS, et al.

Here is a direct pseudocode translation to Crystal of the algorithm.
Here p is the prime to test to create Mp = 2^p - 1 as a prime number.

def mp_prime?
  p = self.to_big_i
  s, mp = 4.to_big_i, (1.to_big_i << p) - 1
  (p-2).times { s = ((s &* s) - 2) % mp }
  s.zero?
end

I wrote a comparison test suite, Miller-Rabin (MR) versus the Lucas Lehmer (LL).

def mpcompare
  p = (ARGV[0].to_u64 underscore: true)
  steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
  puts "Using Lucas Lehmer test"
  puts tm { p.mp_prime? }; puts
  mp = (1.to_big_i << p) - 1
  puts "Using Miller- Rabin test"
  puts tm { mp.prime? }; puts
end

Here are some sample results for the 26–31 Mp primes.

My Linux laptop: Lenovo Legion Slim 5, AMD Ryzen 7 8845HS, 5.1GHz, 8C|16T

➜  crystal-projects ./MR_vs_LL 23209
Using Lucas Lehmer test
00:00:01.206712410

Using Miller- Rabin test
00:00:04.954200650

➜  crystal-projects ./MR_vs_LL 44497
Using Lucas Lehmer test
00:00:06.065008826

Using Miller- Rabin test
00:00:22.960613112

➜  crystal-projects ./MR_vs_LL 86243
Using Lucas Lehmer test
00:00:31.244206519

Using Miller- Rabin test
00:02:02.685110966

➜  crystal-projects ./MR_vs_LL 110503
Using Lucas Lehmer test
00:00:59.973348587

Using Miller- Rabin test
00:03:25.674120484

➜  crystal-projects ./MR_vs_LL 132049
Using Lucas Lehmer test
00:01:30.333406242

Using Miller- Rabin test
00:05:23.010111264

➜  crystal-projects ./MR_vs_LL 216091
Using Lucas Lehmer test
00:03:25.890716233

Using Miller- Rabin test
00:16:46.958948555

This is now the total MP generator code.

require "big"

module Primes
  module Utils
    def mp_prime?
      p = self.to_big_i
      s, mp = 4.to_big_i, (1.to_big_i << p) - 1
      (p-2).times { s = ((s &* s) - 2) % mp }
      s.zero?
    end
  end
end

struct Int; include Primes::Utils end

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"
  puts "Starting Mp gen process at step #{steps}"
  r = (1.to_big_i << p) - 1
  modpnk = modpn = 3.to_big_i << p
  (steps-1).times { modpnk = (modpnk << 2) + modpn }
  loop do
    mp = modpnk + r
    print "\rsteps = #{steps}"
    break if (p = mp.bit_length).mp_prime?
    steps += 1
    modpnk = (modpnk << 2) + modpn
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end

def mpgenerate
  p = (ARGV[0].to_u64 underscore: true)
  steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
  puts tm { mpgen(p, steps) }; puts
end

mpgenerate

The MR code determines non-primes faster, while LL determines primes faster.
The literature says there are bit optimizations to do the LL mod operation (xxx % mp). There’s also free software to implement LL with GPUs|Cuda|OpenCL.

So a little research has produced significant improvements, with lots still to come.