CRYSTAL_WORKERS
This application has made me realize the utility of having CRYSTAL_WORKERS
.
I can increase the effective number range processing by adjusting the number of CRYSTAL_WORKERS
I can manually set to use at runtime.
To use 8 threads simultaneously, I reduced the input numbers from 64 to 62 bits.
Using htop
to monitor memory/threads activity, I was able to eventually get this to run for a range of 19,505,950. This was on the razor’s edge of where it would run/crash, and backing off using CRYSTAL_WORKERS=7
allowed it to run all the time, though slower.
➜ crystal-projects CRYSTAL_WORKERS=8 ./twinprimes_ssoz 4410000000000000000 4410000000019505950
threads = 8
using Prime Generator parameters for P7
segment size = 92887 resgroups; seg array is [1 x 1452] 64-bits
twinprime candidates = 1393305; resgroups = 92887
GC Warning: Repeated allocation of very large block (appr. size 1610616832):
May lead to memory leak and poor performance
each of 15 threads has nextp[2 x 102886522] array
setup time = 10.332723 secs
perform twinprimes ssoz sieve
GC Warning: Repeated allocation of very large block (appr. size 1646186496):
May lead to memory leak and poor performance
2 of 15 twinpairs doneGC Warning: Repeated allocation of very large block (appr. size 1646186496):
May lead to memory leak and poor performance
15 of 15 twinpairs done
sieve time = 26.278561 secs
total time = 36.611284 secs
last segment = 92887 resgroups; segment slices = 1
total twins = 13835; last twin = 4410000000019503222+/-1
I was able to reliably increase the range to 20+M by using 7 workers, as below.
➜ crystal-projects CRYSTAL_WORKERS=7 ./twinprimes_ssoz 4410000000000000000 4410000000020505950
threads = 8
using Prime Generator parameters for P7
segment size = 97649 resgroups; seg array is [1 x 1526] 64-bits
twinprime candidates = 1464735; resgroups = 97649
GC Warning: Repeated allocation of very large block (appr. size 805310464):
May lead to memory leak and poor performance
each of 15 threads has nextp[2 x 102886522] array
setup time = 11.12939 secs
perform twinprimes ssoz sieve
GC Warning: Repeated allocation of very large block (appr. size 1646186496):
May lead to memory leak and poor performance
1 of 15 twinpairs doneGC Warning: Repeated allocation of very large block (appr. size 1646186496):
May lead to memory leak and poor performance
15 of 15 twinpairs done
sieve time = 33.82635 secs
total time = 44.95574 secs
last segment = 97649 resgroups; segment slices = 1
total twins = 14564; last twin = 4410000000020502132+/-1
Notice in both instances, the number of Twin Primes are about 1% of the number of twin candidates in each range.
Being able to manually set/limit the usable number of threads at runtime is a feature
I’m not aware other languages have. In cases like this, I can now maximize memory use by limiting the number of simultaneous run threads, to get results for larger ranges.
Again, kudos to the devs for thinking about this.
(Though I suggest changing the default from 4 workers to the system number when compiling for multi-threading, then users won’t need to manually increase workers to get max performance, which is the usual desired case.)
I hope this little tip is helpful to others doing multi-threading if they encounter similar memory use issues.
ADDITION
Using only 2 workers I was able to process a range of over 1 billion 64-bit numbers, with headroom to spare. I think this is really damn impressive!
➜ crystal-projects CRYSTAL_WORKERS=2 ./twinprimes_ssoz 18400000000000000000 18400000001000005950
threads = 8
using Prime Generator parameters for P7
segment size = 131072 resgroups; seg array is [1 x 2048] 64-bits
twinprime candidates = 71429010; resgroups = 4761934
GC Warning: Repeated allocation of very large block (appr. size 805310464):
May lead to memory leak and poor performance
each of 15 threads has nextp[2 x 203034841] array
setup time = 21.855971 secs
perform twinprimes ssoz sieve
1 of 15 twinpairs doneGC Warning: Repeated allocation of very large block (appr. size 3248558080):
May lead to memory leak and poor performance
10 of 15 twinpairs doneGC Warning: Repeated allocation of very large block (appr. size 3248558080):
May lead to memory leak and poor performance
15 of 15 twinpairs done
sieve time = 385.933012 secs
total time = 407.788983 secs
last segment = 43342 resgroups; segment slices = 37
total twins = 670310; last twin = 18400000001000005188+/-1