Optimized Mersenne Prime Search

jzakiya · October 31, 2024, 8:34pm

Merseene Primes (Mp) are primes of form: Mp = 2^p - 1, for p prime.

The newest one was found on October 12, 2024, and announced on October 21, 2024 (also largest known prime), for p = 136,279,841. This is an integer of 136,279,841 consecutive binary ‘1’ bits, which is over 41M decimal digits long. The previous Mp was found December 7, 2018 and announced December 21, 2018, almost a full 6 years between new primes. It the longest computer age dry spell.

Some articles:

Videos:

Here is the current list of all 52 Mp primes.

2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269, 2976221, 3021377, 6972593, 13466917, 20996011, 24036583, 25964951, 30402457,32582657, 37156667, 42643801, 43112609, 57885161, 74207281, 77232917, 82589933, 136279841

Its finder, Lucas Durant, was formerly employed by Nvidia. He strung together a bunch of cloud servers to do his computing, and said he spent around $2M in total over two years to do it.

Well I ain’t got $2M, but I do have a better algorithm for finding them.

Below is Crystal code to generate all the odd prime Mp (code to perform prime? not shone). It’s based on math I’ve developed I call Prime Generator Theory (PGT).

PGT is based on the use of modular groups to find and characterize everything you want to know about primes, their gaps, and distribution. All the (odd) primes exist as the congruent residue values of modular groups. The Mp are congruent residue values for just one residue value for optimally crafted modular group sizes.

Read this paper for the basics of PGT’s mathematical foundation. It’s very simple, and requires only basic arithmetic and logic (no higher order math needed).

This paper (pgs 7-8) provides the math structure for an optimed Mp search.

From PGT, Mp values have the form:
Mp = 2^p – 1 = (2^n · 3)·k + (2^n – 1), n odd (1, 3, 5, 7…)

However to find the Mp, k has this special optimum form:
k0 = 0, ki+1 = 4·ki + 1 = (ki << 2) + 1
1, 5, 21, 85, 341, 1365, 5461, 21845, 87381, 349525, 1398101, 5592405…

The code for mpgen uses a known Mp p value to generate the next Mp p value.

This is the single thread version.

def mpgen(p)
  puts "Starting Mp prime = #{p}"
  k, steps = 1.to_big_i, 1u64
  r = (1.to_big_i << p) - 1
  modpn = 3.to_big_i << p
  loop do
    mp = modpn*k + r
    (p = mp.bit_length; break) if mp.prime?
    k = (k << 2) + 1
    steps += 1
    print "\rsteps = #{steps}" if steps.divisible_by? 10
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end

def mpgens (p)
  p = p.to_big_i
  loop { puts tm {p = mpgen(p)}; puts }
end

mpgens 3

Here is output for generating the first few Mp primes.
The number of steps and times get larger as the primes increase.

➜  crystal-projects ./MPgenerate
Starting Mp prime = 3
steps = 1
Next Mp prime = 5
00:00:00.000053029

Starting Mp prime = 5
steps = 1
Next Mp prime = 7
00:00:00.000008866

Starting Mp prime = 7
steps = 3
Next Mp prime = 13
00:00:00.000016440

Starting Mp prime = 13
steps = 2
Next Mp prime = 17
00:00:00.000020148

Starting Mp prime = 17
steps = 1
Next Mp prime = 19
00:00:00.000011262

Starting Mp prime = 19
steps = 6
Next Mp prime = 31
00:00:00.000027081

Starting Mp prime = 31
steps = 15
Next Mp prime = 61
00:00:00.000056636

Starting Mp prime = 61
steps = 14
Next Mp prime = 89
00:00:00.000098844

So here’s the mission for those interested.
Using this optimized algorithm we can find the next Mp (and others after it).

My current laptop is:
Linux, Lenovo Legion 5, AMD Ryzen 7 8845HS, 3.8-5.1 GHz, 8C|16T. 16MB,

There are at least two things needed to do to make this practical.

Optimize the arithmetic computations.
Literally all the math is done with BigInts, so need to optimize the math done with them.
Create an optimized parallel system.There’s two ways I see doing this.
a) Create optimized parallel code that can run locally on multi-core cpus.
b) Like GIMPS, create a distributed network that can compute values for each k step.The simple(st) way is to distribute the step(s) value(s) to parallel compute engines, who then numerate the k value progressions for its value, then use it to compute that mp value and check it, and send back its results.

It’s an extremely good project to advertise|market the power|benefits of Crystal!

I’ve almost done the paper to publish to explain all the math, the code, and results. I don’t want to release it until the next Mp is found, to prove that it works (and for PR!).
But I’ll provide the paper to anyone who’s a part of the project, to see all the math.

So that’s it.

I’d love to hear what the devs think, and anyone else with hardware|resources to apply to the project.

Jabari Zakiya

rob · November 1, 2024, 9:10am

Wow! So cool!

I’ve been using mosquito (shameless plug) to distribute GPU loads out of the cloud and into a Mac M chip in my house — cloud GPUs are absurdly expensive right now and I was able to get a used M series for the cost of about a months GPU rent.

Access to the redis backend is all that’s really needed for mosquito to run, and that’s easy enough to facilitate. It’s been a really effortless pattern for me to maintain several months in at this point.

Admittedly Mosquito itself feels like it would be a bit heavy handed for the use case of distributing a workload of BigInts (if I’m reading this right). A redis server itself would easily serve the purpose of a central authority for what numbers are being computed, pending, not started yet, etc.

I think I wonder what it looks like to run a computation like this. Does a single “prime check” finish in hours? Minutes?

In order to build a group of workers to compute this without spending $2M, what concessions can be made? Is this a GPU accelerated load? How does it demand memory space and CPU time availability?

I have a dozen Raspberry Pis sitting collecting dust, but they’re obviously limited on both memory and CPU. How much does that matter?

kojix2 · November 1, 2024, 12:47pm

I think this is really great

But I am very sorry to tell you this, but I cannot come up with $2 million.
I could save money every day, change to a better paying job, buy a lottery ticket, change my toothpaste to a cheaper one, work hard for a month, a year, or but it still might be difficult.

kojix2 · November 1, 2024, 1:07pm

Let me share something a bit more useful. The Ruby Association provides grants for projects aimed at improving the Ruby ecosystem. Each year, 3-4 projects are selected, some of which are science-related. Due to low visibility, competition is low, and I’ve received a grant myself. However, the grant amount is modest—around 500,000 yen (about $3,300)—far from enough to discover new prime numbers.

jzakiya · November 1, 2024, 6:45pm

I would suggest don’t rule anything out at this point in the brainstorming.

Raspberry PIs are much faster now, and RPI clusters could be viable.

https://www.extremetech.com/extreme/259553-750-raspberry-pi-mini-computers-turned-supercomputer-los-alamos-national-laboratory

WE could also create a formal crowd sourcing project (headed by Manas?) to
publicize the project, and get $$$, people, resources, and publicity on it.

We could even approach Lucas Durant (who found the current Mp) and
propose he use his system resources to implement this better algorithm!

Again, this is a project that will automatically generate a lot of interest
from a wide swath of people|groups in different fields and backgrounds.

And wouldn’t it be nice to have Crystal associated with HPC computing?

On my $1500 Linux laptop, starting from scratch, I was able to generate the
first 26 Mp primes in less than 9 hours total!

This is from the Wikipedia page on Mersenne Primes for comparison.

The search for Mersenne primes was revolutionized by the introduction of the electronic digital computer. Alan Turing searched for them on the Manchester Mark 1 in 1949,[12] but the first successful identification of a Mersenne prime, M521, by this means was achieved at 10:00 pm on January 30, 1952, using the U.S. National Bureau of Standards Western Automatic Computer (SWAC) at the Institute for Numerical Analysis at the University of California, Los Angeles (UCLA), under the direction of D. H. Lehmer, with a computer search program written and run by Prof. R. M. Robinson. It was the first Mersenne prime to be identified in thirty-eight years; the next one, M607, was found by the computer a little less than two hours later. Three more — M1279, M2203, and M2281 — were found by the same program in the next several months. M4,423 was the first prime discovered with more than 1000 digits, M44,497 was the first with more than 10,000, and M6,972,593 was the first with more than a million. In general, the number of digits in the decimal representation of Mn equals ⌊n × log102⌋ + 1, where ⌊x⌋ denotes the floor function (or equivalently ⌊log10Mn⌋ + 1).

Here is more output from my laptop. Please appreciate these results.

Starting Mp prime = 4423
steps = 2633
Next Mp prime = 9689
00:02:23.229006562

Starting Mp prime = 9689
steps = 126
Next Mp prime = 9941
00:00:14.844834712

Starting Mp prime = 9941
steps = 636
Next Mp prime = 11213
00:01:28.605453907

Starting Mp prime = 11213
steps = 4362
Next Mp prime = 19937
00:28:45.329705395

Starting Mp prime = 19937
steps = 882
Next Mp prime = 21701
00:11:15.001600745

Starting Mp prime = 21701
steps = 754
Next Mp prime = 23209
00:11:23.369180573

Starting Mp prime = 23209
steps = 10644
Next Mp prime = 44497
07:43:34.371770698

I’ve now made the code easier to use, to take cli inputs.

def mpgen(p, steps = 1u64)
  puts "Starting Mp prime = #{p}"
  puts "Starting steps for k is #{steps}"
  k = 0.to_big_i
  steps.times { k = (k << 2) + 1 }
  r = (1.to_big_i << p) - 1
  modpn = 3.to_big_i << p
  loop do
    mp = modpn*k + r
    print "\rsteps = #{steps}"
    (p = mp.bit_length; break) if mp.prime?
    k = (k << 2) + 1
    steps += 1
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end

def mpgenerate
  p = (ARGV[0].to_u64 underscore: true)
  steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
  puts tm { mpgen(p, steps) }; puts
end

mpgenerate

Here’s the gist file of all the code.

gist.github.com

https://gist.github.com/jzakiya/dc62795c12d676a06628d38ab22f0658

MPgen.cr

# 2024/10/12: New largest Mersenne Prime submitted
# 2024/10/21: New largest Mersenne Prime announced
# M52 = 2^p - 1; for; p = 136_279_841; over 41M digits
#
# This code starts from a known Mp prime to generate the nexp Mp prime
# Compile as: $ crystal build --release --mcpu native MPgen.cr
# Run as: $ ./MPgen prime steps
# Exampl: $ ./MPgen 23209 10600 (or with underscores 23_209 10_600)
#           Starting Mp prime = 23209
#           Starting steps for k is 10600

This file has been truncated. show original

Now you enter the prime and k step value from the cli as shown below.
Please play with it on your system for different Mp p|steps values.

➜  crystal-projects ./MPgen 23209 10_600
Starting Mp prime = 23209
Starting steps for k is 10600
steps = 10644
Next Mp prime = 44497
00:03:44.412783193

Before starting from k = 1, it took 7hrs and 43mins to go from Mp 23209 to 44497.
Starting from the 10600 kth step it took 3mins 44secs.

So an optimized search would start from some large step value for a given Mp prime. I have to do more math analysis to determine the largest minimum step size to start with, as the Mp primes get larger and larger.

Hopefully you’re seeing there’s lots of room for various optimizations to the whole process.

(Ideally this is what GIMPS should be doing, but I couldn’t get them to take me seriously. They continue to use and test every prime p to calculate Mp = 2^p - 1, but PGT math shows you don’t have to do that! It seems the only way to get people to listen is to show them the results using a better technique. And as the saying goes: If you want to see something done right do it yourself.)

kojix2 · November 2, 2024, 12:52am

I lack the mathematical aptitude to understand what is happening here, but if I may freely brainstorm, I can suggest a way for individuals to leverage cloud services like AWS, GCP, or Azure to execute large-scale workflows. (Note: this approach isn’t cost-effective.)

Linux command-line tools are commonly used for genetic data analysis in medicine and life sciences. In bioinformatics, experts develop command-line tools in various languages, such as JAVA, Python, R, C, C++, Rust, and Nim. However, no single person can be an expert in all these areas, so workflow languages like Nextflow are invaluable for integrating and running these tools efficiently in the cloud services.

Nextflow is an open-source tool supported by a Spanish company, Seqera. It even provides a web interface, allowing users to integrate inputs and outputs of various tools and efficiently use cloud computing resources like AWS. This setup enables parallel processing of large sample sets and allows workflows to resume from mid-point if any errors occur.

When I conducted my modest degree research, I used Nextflow and AWS to access computational resources, as there was no one around with sufficient computing expertise or access to an HPC environment. Ultimately, I spent around $1,500 to $2,000, covering all expenses myself. Cloud computing enabled me to conduct processes on a scale that would be impossible on a local computer.

How effective this type of workflow language might be for prime number searches is uncertain, but it can enable parallel processing on the scale of 200 to 300 cores. I recently used Nextflow and AWS to test short-term processing like this.

Due to its high cost, this approach isn’t suitable for long-term runs. However, it can be very effective if you need vast computational resources over a short period. There may be workflow languages widely used in mathematics and physics. If you don’t have an HPC environment freely available for personal use, this could be one option for demonstrating your work.

jzakiya · November 2, 2024, 6:42pm

In playing with the software, I realized (again) you don’t have to start from strictly prime values. Looking back at my math (developed circa 2018) shows you can use any odd n exponent vaue to find the next Mp prime greater than it.

Mp = 2^p – 1 = (2^n · 3)·k + (2^n – 1), n odd (1, 3, 5, 7…)

I’ve thus changed the code’s first output statement (in gist too) to reflect this.

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"  # <--------
  puts "Starting steps for k is #{steps}"
  k = 0.to_big_i
  steps.times { k = (k << 2) + 1 }
  r = (1.to_big_i << p) - 1
  modpn = 3.to_big_i << p
  loop do
    mp = modpn*k + r
    print "\rsteps = #{steps}"
    (p = mp.bit_length; break) if mp.prime?
    k = (k << 2) + 1
    steps += 1
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

Here’s what this means.

Using GIMPS method, the current Mp of p = 136,279,841, meant checking every
prime > 82,589,993 (the previous largest Mp prime) before finding it, a span
of 34,695,124 primes. You can understand now why finding it took 6 years between them, and 2 years for Lucas Durant using his resources.

Let’s look at specific examples to drive home this point.

The next Mp prime after 89 is 107.
There are 4 primes from 89 to 107 {97, 101, 103, 107} so GIMPS would check each, for 4 steps.

Using my method, starting from 89 with k = 1 takes 9 k steps.
But I don’t need to know what the primes are up to the next Mp prime!!

But knowing 97 is next prime > 89, we can start from 95 (the largest odd n < 97),
and starting from k = 1 finds 107 in 6 k steps. But starting from the 3rd k step
takes only 4 steps, the same as GIMPS.

➜  crystal-projects ./MPgen 89
Starting Mp gen exponent = 89
Starting steps for k is 1
steps = 9
Next Mp prime = 107
00:00:00.000190567

➜  crystal-projects ./MPgen 95
Starting Mp gen exponent = 95
Starting steps for k is 1
steps = 6
Next Mp prime = 107
00:00:00.000177984

➜  crystal-projects ./MPgen 95 3
Starting Mp gen exponent = 95
Starting steps for k is 3
steps = 6
Next Mp prime = 107
00:00:00.000060834

Very importantly, this methods ensures 100%, no smaller prime between them can
produce another (smaller) Mp value, so you never have to go back and check for them! Thus, all the Mp primes are guaranteed to be found in-order!

Thus GIMPS has to do this, per the Wikepedia article.

Since 1997, all newly found Mersenne primes have been discovered by the Great Internet Mersenne Prime Search, a distributed computing project. In December 2020, a major milestone in the project was passed after all exponents below 100 million were checked at least once.

And also:

In late 2020, GIMPS began using a new technique to rule out potential Mersenne primes called the Probable prime (PRP) test, based on development from Robert Gerbicz in 2017, and a simple way to verify tests developed by Krzysztof Pietrzak in 2018. Due to the low error rate and ease of proof, this nearly halved the computing time to rule out potential primes over the Lucas-Lehmer test (as two users would no longer have to perform the same test to confirm the other's result), although exponents passing the PRP test still require one to confirm their primality.

Here again is the current list of the 52 known Mp prime values.

2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213, 19937, 21701, 23209, 44497, 86243, 110503, 132049, 216091, 756839, 859433, 1257787, 1398269, 2976221, 3021377, 6972593, 13466917, 20996011, 24036583, 25964951, 30402457, 32582657, 37156667, 42643801, 43112609, 57885161, 74207281, 77232917, 82589933, 136279841

In several cases, the next Mp prime is nearly double (100% larger) in value from the previous one. And 136,279,8941 is 65% larger than its previous of 82,589,993. And again, there are 34,695,124 primes to check between them.

So to search optimally, as the primes density decrease as they get larger, the density of Mp primes decreases even more. Thus you don’t want (need) to start the search process “close” to the last known Mp prime.

Thus the relevant questions for my method become:

How large can the starting exponents be to ensure not missing any new Mp?
And, how large can the starting k step values be too?

Intelligently picking them will greatly reduce the search times, and resources needed.

jzakiya · November 3, 2024, 6:18pm

New improvement.

I eliminated a multiplication and need to construct the k multipliers.
I now only do bit shifts and adds.

So this.

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"
  puts "Starting steps for k is #{steps}"
  k = 0.to_big_i
  steps.times { k = (k << 2) + 1 }
  r = (1.to_big_i << p) - 1
  modpn = 3.to_big_i << p
  loop do
    mp = modpn*k + r
    print "\rsteps = #{steps}"
    (p = mp.bit_length; break) if mp.prime?
    k = (k << 2) + 1
    steps += 1
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

Can be better|faster computed as this.

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"
  puts "Starting Mp gen process at step #{steps}"
  r = (1.to_big_i << p) - 1
  modpnk = modpn = 3.to_big_i << p
  (steps-1).times { modpnk = (modpnk << 2) + modpn }
  loop do
    mp = modpnk + r
    print "\rsteps = #{steps}"
    (p = mp.bit_length; break) if mp.prime?
    steps += 1
    modpnk = (modpnk << 2) + modpn
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

For larger number of steps performance is observably faster.

➜  crystal-projects ./MPgen1b 11213
Starting Mp gen exponent = 11213
Starting Mp gen process at step 1
steps = 4362
Next Mp prime = 19937
00:26:58.407713741

➜  crystal-projects ./MPgen1b 23209
Starting Mp gen exponent = 23209
Starting Mp gen process at step 1
steps = 10644
Next Mp prime = 44497
07:29:08.233810694

This is now probably the most efficient way to compute the mp test values.
This just leaves using the “best” primaliyt test algorithm|implementation for prime?.

One alternative is to use the primality tester used by GIMPS for very large numbers.
I believe GIMPS is done in C|C++, if so, it should be possible to extract it and wrap it to use by Crystal.

jzakiya · November 8, 2024, 9:23pm

Significant performance increase was achieved by using a better primality test.
I had been using the Miller-Rabin tester, which works for any general integer type.
But I switched to using the Lucas Lehmer test used by GIMPS, et al.

Here is a direct pseudocode translation to Crystal of the algorithm.
Here p is the prime to test to create Mp = 2^p - 1 as a prime number.

def mp_prime?
  p = self.to_big_i
  s, mp = 4.to_big_i, (1.to_big_i << p) - 1
  (p-2).times { s = ((s &* s) - 2) % mp }
  s.zero?
end

I wrote a comparison test suite, Miller-Rabin (MR) versus the Lucas Lehmer (LL).

def mpcompare
  p = (ARGV[0].to_u64 underscore: true)
  steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
  puts "Using Lucas Lehmer test"
  puts tm { p.mp_prime? }; puts
  mp = (1.to_big_i << p) - 1
  puts "Using Miller- Rabin test"
  puts tm { mp.prime? }; puts
end

Here are some sample results for the 26–31 Mp primes.

My Linux laptop: Lenovo Legion Slim 5, AMD Ryzen 7 8845HS, 5.1GHz, 8C|16T

➜  crystal-projects ./MR_vs_LL 23209
Using Lucas Lehmer test
00:00:01.206712410

Using Miller- Rabin test
00:00:04.954200650

➜  crystal-projects ./MR_vs_LL 44497
Using Lucas Lehmer test
00:00:06.065008826

Using Miller- Rabin test
00:00:22.960613112

➜  crystal-projects ./MR_vs_LL 86243
Using Lucas Lehmer test
00:00:31.244206519

Using Miller- Rabin test
00:02:02.685110966

➜  crystal-projects ./MR_vs_LL 110503
Using Lucas Lehmer test
00:00:59.973348587

Using Miller- Rabin test
00:03:25.674120484

➜  crystal-projects ./MR_vs_LL 132049
Using Lucas Lehmer test
00:01:30.333406242

Using Miller- Rabin test
00:05:23.010111264

➜  crystal-projects ./MR_vs_LL 216091
Using Lucas Lehmer test
00:03:25.890716233

Using Miller- Rabin test
00:16:46.958948555

This is now the total MP generator code.

require "big"

module Primes
  module Utils
    def mp_prime?
      p = self.to_big_i
      s, mp = 4.to_big_i, (1.to_big_i << p) - 1
      (p-2).times { s = ((s &* s) - 2) % mp }
      s.zero?
    end
  end
end

struct Int; include Primes::Utils end

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"
  puts "Starting Mp gen process at step #{steps}"
  r = (1.to_big_i << p) - 1
  modpnk = modpn = 3.to_big_i << p
  (steps-1).times { modpnk = (modpnk << 2) + modpn }
  loop do
    mp = modpnk + r
    print "\rsteps = #{steps}"
    break if (p = mp.bit_length).mp_prime?
    steps += 1
    modpnk = (modpnk << 2) + modpn
  end
  print "\rsteps = #{steps}\n"
  puts "Next Mp prime = #{p}"
  p
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end

def mpgenerate
  p = (ARGV[0].to_u64 underscore: true)
  steps = ARGV.size > 1 ? (ARGV[1].to_u64 underscore: true) : 1u64
  puts tm { mpgen(p, steps) }; puts
end

mpgenerate

The MR code determines non-primes faster, while LL determines primes faster.
The literature says there are bit optimizations to do the LL mod operation (xxx % mp). There’s also free software to implement LL with GPUs|Cuda|OpenCL.

So a little research has produced significant improvements, with lots still to come.

jzakiya · November 9, 2024, 11:58pm

The Mersenne Primes Wikipedia page shows how to optimize the mod (xx % mp) operations.

The mod M operation can be made particularly efficient on standard binary computers by observing that

    k ≡ ( k mod 2^n ) + ⌊ k / 2^n ⌋ ( mod 2^n − 1 ) 

This says that the least significant n bits of k plus the remaining bits of k are equivalent to k modulo 2^n−1. This equivalence can be used repeatedly until at most n bits remain. In this way, the remainder after dividing k by the Mersenne number 2^n−1 is computed without using division. For example,
916 mod 2^5−1 = 0b1110010100 mod 2^5−1
	= 	((916 mod 2^5) + int(916 ÷ 2^5)) mod 2^5−1
	= 	(0b10100 + 0b11100) mod 2^5−1
	= 	0b110000 mod 2^5−1
	= 	(0b10000 + 0b1) mod 2^5−1
	= 	0b10001 mod 2^5−1
	= 	0b10001
	= 	17

Moreover, since s × s will never exceed M^2 < 2^2p, this simple technique converges in at most 1 p-bit addition (and possibly a carry from the pth bit to the 1st bit), which can be done in linear time. This algorithm has a small exceptional case. It will produce 2^n−1 for a multiple of the modulus rather than the correct value of 0. However, this case is easy to detect and correct.

Here’s a 3|4x faster implementation using the bits optimized mod operations

require "big"

module Primes
  module Utils
    def mp_prime?
      p = self.to_big_i
      s, mp = 4.to_big_i, (1.to_big_i << p) - 1
      (p-2).times { s = mp_mod(((s * s) - 2), p, mp) }
      s.divisible_by? mp
    end

    private def mp_mod(n, p, mp)
      while n > mp
        n = (n >> p) + (n & mp)
      end
      n
    end
  end
end

struct Int; include Primes::Utils end

The time comparisons between the previous|updated versions of mp_prime?.

Old version

➜  crystal-projects ./MR_vs_LL_test 21701
Using Lucas Lehmer test
true
00:00:01.021152048

➜  crystal-projects ./MR_vs_LL_test 23209
Using Lucas Lehmer test
true
00:00:01.178064500

➜  crystal-projects ./MR_vs_LL_test 86243
Using Lucas Lehmer test
true
00:00:30.419404121

➜  crystal-projects ./MR_vs_LL_test 216091
Using Lucas Lehmer test
true
00:03:19.896989333

➜  crystal-projects ./MR_vs_LL_test 756839
Using Lucas Lehmer test
true
00:51:53.460519407


New version

➜  crystal-projects ./MR_vs_LL_testa 21701
Using Lucas Lehmer test
true
00:00:00.281051190

➜  crystal-projects ./MR_vs_LL_testa 23209
Using Lucas Lehmer test
true
00:00:00.331042313

➜  crystal-projects ./MR_vs_LL_testa 86243
Using Lucas Lehmer test
true
00:00:07.194113919

➜  crystal-projects ./MR_vs_LL_testa 216091
Using Lucas Lehmer test
true
00:01:03.950343715

➜  crystal-projects ./MR_vs_LL_testa 756839
Using Lucas Lehmer test
true
00:13:25.263320206

The last needed computation optimization is for the squaring operations (s &* s).
As the articles says, the fastest are FFT multiplications algorithms for large numbers.

This has become much more usable now, with a little bit more to go.

ADDITION
Simplified code, changed s value to 10, slightly faster performance.

# Lucas-Lehmer primality test for Mersenne Primes
def mp_prime?
  p = self.to_big_i
  s, mp = 10.to_big_i, (1.to_big_i << p) - 1
  (p-2).times do
     s = (s * s) - 2
     while s > mp; s = (s >> p) + (s & mp) end
  end
  s.divisible_by? mp
end

jzakiya · November 15, 2024, 9:19pm

Here’s the latest update (2024/11/15) on the project.

Below is the basic code outline. I replaced all that code for
the Miller-Rabin primality test with a SoZ primality test - prime?.
It’s much less code and is faster for these size MP primes.

Compl: $ crystal build --release --mcpu native MPgenerate.cr
Run as: $ ./MPgenerate <odd|prime exponent value>
Exampl: $ ./MPgenerate 3 # start from first odd MP prime

(All the code is in the gist file I previously provided the link to.)

The code|process now generates a candidate mp prime value, and then its
bit size. If the bit size is a prime number then the mp value is checked.
If its a prime, then it’s a Mersenne Prime value of form 2^p - 1.

Once a new MP prime p is found, its used to start the search for the next one.
It then generates mp values using a new|larger generator modulus and residue.
The process will generate mp values that represent all the primes between the
current MP prime to the next. This value is held|shown in primes_checked.
At the process end, it shows how many steps it took, and primes checked, before
reaching the next MP prime.

This essentially is all the Crystal code I need to find the next MP prime.
I downloaded the GIMPS MP prime checker code, and ran the stress test on my
laptop to test it. I now just have to write the code to send my data to it
to check, and get results back, instead of using mp_prime? shown here.
(I may also do a parallel version to shorten the process on my system.)

Before doing that, I need to see if I can determine an optimum search strategy.
Starting from a known MP prime to find the next is the worst case (but guaranteed)
strategy. But we know we can mathematially do better than that.

Thus the question becomes, how many steps (which equates to primes) can we skip, without missing a possible MP prime along the way?

require "big"

# SoZ|PGT primality test for P6|3
def prime?(n)
  return n|1 == 3 if n < 5        # if n 2 or 3
  return false if (n % 6)|4 != 5  # if () not 1 or 5
  r = 5                           # first prime candidate
  while Math.isqrt(n) >= r
    return false if n % r == 0 || n % (r+2) == 0
    r += 6
  end
  true
end

# Lucas-Lehmer primality test
def mp_prime?(p, mp)
  s = 10.to_big_i
  (p-2).times do
    s = (s * s) - 2
    while s > mp; s = (s >> p) + (s & mp) end
  end
  s.divisible_by? mp
end

def mpgen(p, steps = 1u64)
  puts "Starting Mp gen exponent = #{p}"
  puts "Starting Mp gen process at step #{steps}"
  r = (1.to_big_i << p) - 1
  modpnk = modpn = 3.to_big_i << p
  (steps-1).times { modpnk = (modpnk << 2) + modpn }
  primes_checked = 0u64
  loop do
    mp = modpnk + r
    print "\rsteps = #{steps}"
    p = mp.bit_length.to_u64
    if prime?(p)
      primes_checked += 1
      break if mp_prime?(p, mp)
    end
    steps += 1
    modpnk = (modpnk << 2) + modpn
  end
  print "\rsteps = #{steps}\n"
  puts "Primes checked = #{primes_checked}"
  puts "Next Mp prime = #{p}"
  p
end

def tm; t = Time.monotonic; yield; Time.monotonic - t end

def mpgens
  p = (ARGV[0].to_u64 underscore: true)
  loop { puts tm {p = mpgen(p)}; puts }
end

mpgens

Here’s sample output.

Starting Mp gen exponent = 21701
Starting Mp gen process at step 1
steps = 754
Primes checked = 156
Next Mp prime = 23209
00:00:46.274799061

Starting Mp gen exponent = 23209
Starting Mp gen process at step 1
steps = 10644
Primes checked = 2033
Next Mp prime = 44497
00:27:42.649638942

Starting Mp gen exponent = 44497
Starting Mp gen process at step 1
steps = 20873
Primes checked = 3760
Next Mp prime = 86243
03:59:35.948356635

Here 21701, the 25th MP prime (with M25 6,533 digits long), is used to
start searching for M26, which is 23209, with M26 being 6,987 digits long.

It took 754 steps, and checked the next 156 primes, with 23209 the last.
On my system (with this code) that took 46.27 secs.

To go from M26 to 44497 (M27 - 13,395 digits) took 27 mins 42.64 secs.
To go from M27 to 86243 (M28 - 25,962 digits) took 4 hours.

For comparison, the current largest M52 - 136,279,841, is 41,024,320 digits.
The previous, M51 - 82,589,933, is 24,862,048 digits.

There are 2,901,904 primes greater than 82,589,933 to get to 135,279,841.
(You can imagine now why it took 6 years to find it using GIMPS.)

I’ve shown we really don’t mathematically need|want to check all of them.
It’s just knowing how many the process tells us we can safely skip (not check).

kojix2 · November 16, 2024, 1:30am

Ruby includes the Prime module in its bundled gems. Although many libraries have been moved to external gems for better maintenance, Prime remains bundled. Why?

One reason is competitive programming. In Japan, some students or beginners get into programming through competitions where prime numbers are common problems. These competitions allow only standard libraries and a few packages.

To prevent Ruby from seeming inadequate and to keep its users competitive, the Prime module stays and is regularly updated. This helps attract new programmers, making Prime a part of Ruby’s bundled gems.

Doc: class Prime - prime: Ruby Standard Library Documentation
GitHub: GitHub - ruby/prime: Prime numbers and factorization library.

jzakiya · November 16, 2024, 4:48am

I’m quite aware of it. It didn’t do what I needed, and isn’t that fast, so I wrote my own gem – primes-utils.

https://rubygems.org/gems/primes-utils

I even wrote the PRIMES-UTILS HANDBOOK to explain the math and software for it. Someday I’ll update it to add more|faster algorithms, but it’s completely usable as is, and easy to use.

I’ve also written the Crystal version, which is (of course) much faster. I’ve been waiting for Crystal to do true parallel programming before I release it as a shard, and write its companion book. I’ll now add the mp_prime? code to it.

Check out the Rubygem if you get a chance!

kojix2 · November 24, 2024, 9:23am

I have confirmed that primes-utils is still significantly faster than Ruby’s standard library. (By the way, FasterPrime has benchmarks comparable to primes-utils, which you are probably already aware of.)

On another note, I recently came across a blog post mentioning that TensorDock offers affordable GPU server rentals for individuals. I’m not sure what it would mean to use a GPU in Crystal, but I think it would be great if such a thing were possible.

jzakiya · November 25, 2024, 5:00pm

Yesterday (2024/11/24) I ran my updated primes-utils code (which I’ve been sitting on this year) against the Crystal version. I have this extensive test suite (800 LOC) to test all the methods. I tested it on Ruby 3.3.6, TruffleRuby 24.1.1, and Crystal 1.14.0.

The Crystal version is faster, but not a whole lot more. It’s amazing how far Ruby has come in improving performance! The Ruby code runs faster on TruffleRuby, by a few percentage points than on 3.3.6 (with 3.4 expected Christmas 2024).

The current primes-utils 2.7.0 was released Christmas 2015 to coincide with Ruby 2.7.0, I believe. I’ll try to release the updated version 3.0, to take advantage of Ruby 3.x stuff, by the end of the year.

jzakiya · November 27, 2024, 6:42pm

I found this quite interesting|funny, what Ruby 3.4 can now do on this topic.

I use rvm as my Ruby installer.

After switching to use Ruby 3.3.6 and TruffleRuby 24.1.1 I add gmp gem to them.

$ gem install gmp

Then inside an irb session for each I do:

3.3.6 :007 > def tm; t = Time.now; yield; Time.now - t end

3.3.6 :008 > puts tm { puts (1 << 136_279_841) - 1}

Or run from terminal as:

$ time ruby -e "puts (1 << 136_279_841) - 1"

After printing 41M+ digits, it gives time to complete of:

Ruby 3.3.6:         5.582558072
TruffleRuby 24.1.1: 99.771412

I have no idea why TruffleRuby is so much worse for this?

I’ll definitely include the gmp gem for the updated primes-uitls.

By comparison for Crystal 1.14.0

# mp52print.cr
# crystal build --release --mcpu native mp52print.cr

require "big"

puts ((1.to_big_i << 136_279_841) - 1)

Gives typically:

$ time ./mp52print
   ...
   ...

./mp52print  4.78s user 0.21s system 84% cpu 5.883 total

kojix2 · November 28, 2024, 1:37am

This reminds me of an earlier topic.

Ruby out performs Crystal significantly on this numerical algorithm

Programs that use functions from external C libraries often spend a lot of time calling C functions. C is fast, but we like to apply time-consuming processes to C, so C takes most of our time. If TruffleRuby is slower than Ruby, then TruffleRuby may have a higher overhead cost in calling C functions compared to CRuby.

JP Camara’s blog on compiling Ruby with libgmp. I think this is about linking libgmp to CRuby.

brew install gmp
rvm reinstall ruby-head --with-gmp-dir=$(brew --prefix gmp)

So we may need to be aware that calling the GMP function in the gmp gem’s C extension library and CRuby itself calling the GMP function are slightly different in the way the GMP function is called, I think.

I think that if you compile CRuby with the GMP library specified, some operations will be delegated to GMP without requiring any special handling.

The use cases for the GMP gem are likely: (1) calling GMP functions directly to perform operations even if CRuby is not compiled with GMP, and (2) identifying operations in CRuby that are not delegated to GMP and having GMP handle them instead.

In general, there is a conversion cost when calling C functions from Ruby. For example, Ruby arrays and C pointers need to be converted back and forth. I’ve heard that simple numbers don’t incur much conversion cost due to immediate values, but how about large numbers?

However, if you continue the calculation using C types to avoid the conversion between Ruby and C every time you call a function, it will be the same thing as if we were writing in C, and the advantage of using Ruby will diminish.

I’m not really sure about the difficult cases.
I’ve heard that in Crystal there is no cost to calling C functions, which is what I like about Crystal.

jzakiya · November 29, 2024, 9:18pm

I realized installing Ruby-3.4.0-preview1 with rvm, it automatically compiles with libgmp installed on my system, which Crystal uses too. So installing the gmp gem isn’t necessary, and has no affect.

What’s interesting is that both TruffleRuby 24.1.1 and JRuby 9.4.9.0 take about the same time ~100secs, and both run on a Java foundation. But in general, TRuby is way faster than JRuby for most other numerical computations|algorithms, as it’s based on the GRaalVM. Maybe this is a case TR hasn’t optimized (maybe it’s not the computation but the printing, a WAG?)

So it seems rvm compiled in gmp for all the Rubies I’ve installed and could always do arbitrary size integer math with it. And 3.4-preview1 is no faster than 3.3.6 for this on my system.

kojix2 · November 30, 2024, 1:42am

Ruby has supported YJIT by default since Ruby 3.3, but you need to enable it with --yjit.

ruby --version
ruby --yjit --version

You can check if YJIT is enabled:

p RubyVM::YJIT.enable?

You can also enable it dynamically, which Rails apps seem to do:

RubyVM::YJIT.enable

I tried running primes-utils with YJIT, but it didn’t get much faster. YJIT only optimizes methods that run very often, so some options like thresholds might need adjusting. If JIT doesn’t improve speed much, GraalVM might not help either. If most of the time is spent in GMP, JIT might not make much difference, I think.

Topic		Replies	Views
Ruby out performs Crystal significantly on this numerical algorithm Help & Support	18	1276	April 29, 2023
Kudos to Crystal! News	12	2163	December 29, 2021
On the Infinity of Twin Primes and other K-tuples	22	1103	November 27, 2019
Crystal vs Truffleruby News	8	930	December 10, 2023
Twin Primes Segmented Sieve of Zakiya (SSoZ) Explained News	16	616	June 29, 2022

Related topics