Concurrent HTTP request performance

scttnlsn · June 5, 2024, 1:24pm

I’m trying to fetch a large set of records from an HTTP API concurrently and have structured my code like this:

require "http/client"
require "json"

fetch_queue = Channel(Int32).new
results_queue = Channel(JSON::Any).new

def fetch(url : String)
    res = HTTP::Client.get(url)
    JSON.parse(res.body)
end

spawn do
    (1..10000000).each do |id|
        fetch_queue.send(id)
    end
end

start_time = Time.utc
count = 0

spawn do
    loop do
        res = results_queue.receive
        
        count += 1
        if count % 1000 == 0
            elapsed = Time.utc - start_time
            seconds = elapsed.total_seconds
            rate = count / seconds
            puts "#{count} (#{elapsed}) @ #{rate.round}/sec"
        end
    end
end

(1..128).each do |i|
    spawn do
        loop do
            id = fetch_queue.receive
            res = fetch("https://example.com/api/items/#{id}/")
            results_queue.send(res)
        end
    end
end

sleep

I’m getting around 200 results per second at the moment. I tried implementing something similar with Node and I’m getting around 1600 results per second (while also persisting them to disk). I’m curious if I’m doing something wrong in the Crystal code above because I was expecting significantly higher throughput. I even tried compiling with crystal build --release -Dpreview_mt to get multiple threads but that didn’t seem to help much (probably since this is so IO-bound).

Any suggestions?

Thanks!

aiac · June 5, 2024, 2:37pm

Increase fiber count?
Maybe the node http client share the connection or use connection pool ?

scttnlsn · June 5, 2024, 3:52pm

This doesn’t seem to help. I’ve tried spawning 1000 fibers on the fetching side and I still hover around 200 requests/sec.

Maybe - I’m using the built-in fetch function. I’ll need to look into the docs on that.

straight-shoota · June 5, 2024, 4:41pm

That seems very likely. I would certainly expect that.

In Crystal, HTTP::Client.get is a one-off request and establishes a new connection every time. That’s a huge overhead if you’re always connecting to the same host again and again.

I’m sure you could easily avoid that if you initialize a dedicated HTTP::Client instance for every worker fiber. This means requests can reuse the same connection.

scttnlsn · June 5, 2024, 5:35pm

Thanks so much! I’m reusing a connection per fiber now and getting ~1900 results/second. Amazing.

aiac · June 5, 2024, 6:16pm

Blacksmoke16 · June 5, 2024, 6:20pm

Yes, which is why the suggestion was for an HTTP::Client per fiber, versus sharing a single or pool of clients between fibers.

pfischer · June 6, 2024, 7:34pm

What to use for pooling? Trick the existing DB Pool from crystal-db project? (GitHub - crystal-lang/crystal-db: Common db api for crystal) - I read somewhere that it can be used…

straight-shoota · June 6, 2024, 7:57pm

Pooling doesn’t really make much sense in this use case because all workers are constantly communicating with the same endpoint. So each just gets its own client and connection, no need for pooling overhead.

(In case this is a generic question about pooling HTTP connections, please start a new thread for that)

Topic		Replies	Views
Limit concurrency for HTTP::Client Help & Support	4	506	December 20, 2020
Concurrency issues with an API server Help & Support	17	953	July 13, 2023
Making curl-to-crystal tool Community	19	1383	October 7, 2023
Concurrency question (benchmark example) Help & Support	2	387	February 5, 2021
Multithreaded Crystal initial thoughts	38	7467	February 21, 2024

Concurrent HTTP request performance

Related topics