I’m trying to fetch a large set of records from an HTTP API concurrently and have structured my code like this:
require "http/client"
require "json"
fetch_queue = Channel(Int32).new
results_queue = Channel(JSON::Any).new
def fetch(url : String)
res = HTTP::Client.get(url)
JSON.parse(res.body)
end
spawn do
(1..10000000).each do |id|
fetch_queue.send(id)
end
end
start_time = Time.utc
count = 0
spawn do
loop do
res = results_queue.receive
count += 1
if count % 1000 == 0
elapsed = Time.utc - start_time
seconds = elapsed.total_seconds
rate = count / seconds
puts "#{count} (#{elapsed}) @ #{rate.round}/sec"
end
end
end
(1..128).each do |i|
spawn do
loop do
id = fetch_queue.receive
res = fetch("https://example.com/api/items/#{id}/")
results_queue.send(res)
end
end
end
sleep
I’m getting around 200 results per second at the moment. I tried implementing something similar with Node and I’m getting around 1600 results per second (while also persisting them to disk). I’m curious if I’m doing something wrong in the Crystal code above because I was expecting significantly higher throughput. I even tried compiling with crystal build --release -Dpreview_mt
to get multiple threads but that didn’t seem to help much (probably since this is so IO-bound).
Any suggestions?
Thanks!