Parallel Processing in Crystal

I thought I’d provide a real world example of how parallel processing can save your ass using Crystal.

I had a function where I was doing allot of numerical processing for a function.
I started running it for one set of inputs, which I knew was going to take allot of time, just to get a sense of how long it would take. I put in an internal progress indicator that told me what percentage of the processing was done.

So I started it on previous Thursday at 11pm and let it run overnight. After letting it run for 24 hours it had done ~6%. So I let it continue, and Tuesday (3 days ago) it was at ~31%. At that rate I figured it would finish in 16 days. So I said to myself, there’s gotta be a better way. Life is too short!

So on Wednesday I woke up, and started writing a multi-threaded version.
I broke the process into independent segments, and first wrote a segmented version run serially (I actually did this in Ruby because it’s so much easier to design/run the code dynamically). I got that working and tested in Ruby.

I then did a Crystal serial version of the Ruby code, and got that working.

Once that was done, all I had to do was convert the serial loop into a parallel process. I tested it with small values first to make sure I got correct answers. I gradually used larger values, and determined the Crystal parallel version was about 20x faster then the serial version for large inputs.

I did all this by early afternoon Wednesday, taking my time.
Now I was ready for the Big Show!

I have 8 threads on my I7 laptop, which was still running the original Crystal serial program, now about 36% done. So before going to bed on Wednesday, around 12am I started the parallel version with the inputs for the running program. I set CRYSTAL_WORKERS=7, to use the other 7 threads, and let it run overnight.

Since I already determined the parallel version should be around 20x faster, well if the old process took 20 days, the parallel one should take around a day, but since it was projected to take 16 day,s the parallel should be much less than 24 hours.

Ultimately when I came in Thursday evening around 7:20pm and saw in htop the threads had stopped, and looked at the program output, which I timed using time, and the total run time was 19 hrs 42 mins!

I’m sharing this to let people know that Crystal’s multi-threading does work. It can (should) get better, but it’s usable right now, so think how you could possibly use it to make your life easier.

Below is the code for the main routine that has the multi-threading code.

# $ crystal build gapsegparallel.cr -Dpreview_mt --release
# $ time CRYSTAL_WORKERS=7 ./gagsegparallel

def gaps(p, n = 8)
  primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47] of UInt64
  i = primes.index(p).not_nil!
  modpn  = primes[0..i].product
  rescnt = primes[0..i].map{ |p| p-1 }.product
  puts "modp#{p} = #{modpn}"
  puts "rescnt = #{rescnt}"
  a_n = Hash(UInt64, UInt64).new(0)  # hash of gap coefficients a_n
  gaps = [] of Hash(UInt64, UInt64)  # array or gap hashes for each segment
  i = 0
  done = Channel(Nil).new()          # process each segment in parallel
  rangeres(modpn, n).each_cons_pair do |n1, n2| 
    spawn do 
      gaps << gapcounts(n1, n2, modpn)
      print "\r#{i += 1} of #{n} segments done"
      done.send(nil)
  end end
  n.times { done.receive }           # wait for all threads to finish
  puts
  gaps.reduce { |h1, h2| a_n = h1.merge(h2) { |key, v1, v2| v1 + v2 } }
  a_n = a_n.to_a.map{ |k, v| (k == 2 || k == 4) ? [k, v*2 | 1] : [k, v*2]}.to_h
  a_n.to_a.sort
end
12 Likes

What a cliffhanger you put there :stuck_out_tongue: . Thanks for sharing!

1 Like