TIL: crystal-pg low performance with SSL

Hi everyone,

I wanted to share a recent experience I had with crystal-pg that might be helpful for others.

I’ve been benchmarking Crystal against Go and noticed that my Crystal implementation was performing 5-10 times slower than the Go version when running the same SQL query.

My initial tests with SQLite showed comparable performance to Go, so the bottleneck was clearly with PostgreSQL. I spent a lot of time tweaking connection parameters (retry_attempts, pool_size, max_connections, etc.) and even tried compiling with -D preview_mt, but nothing seemed to close the significant performance gap.

On a whim, I added sslmode=disable to my connection string, and the results were instantaneous and dramatic. The performance shot up, bringing it nearly on par with the Go implementation.

I haven’t had the chance to dig into why enabling SSL has such a massive performance impact with crystal-pg, but the difference is night and day.

If you’re running into similar performance issues with PostgreSQL, I highly recommend giving this a try. It might just be the solution you’re looking for. I’d be curious to hear if anyone else has encountered this or has any insights into the cause.

1 Like

That’s a good find. I wonder, did you try playing with the Crystal version and this to see if there was some change with Crystal that would have affected this?

Before trying Go, or diving into code changes I actually played with Crystal versions. Tried 1.14.0 1.16.0 1.17.1 and master. Had no luck, hence I pushed forward with building a reproducible version with Go and narrowed it down to the driver. It wasn’t easy though :rofl:

1 Like

It’s not clear to me from your message if Go’s implementation is using SSL or not. Could you clarify so it’s an :red_apple: to :red_apple: comparison?

1 Like

That’s a good question, I only used sslmode=disable with Crystal’s connection string. I assume that Go uses SSL by default…will check and verify.

Actually some help might be useful, this is the Go Postgresql Driver GitHub - jackc/pgx: PostgreSQL driver and toolkit for Go I did some quick code search and couldn’t see any default for sslmode. Can anyone dig into this?

IIUC configTLS function is responsible for configuring TLS options for a given host, and from what I can see, it uses sslmode=prefer by default.

References:

1 Like

Regardless of whether the Go comparison module uses TLS: It’s very surprising that toggling TLS makes a significant difference in crystal-pg.
With modern hardware and software, TLS shouldn’t have much overhead over unencrypted traffic.

Your findings suggest that something appears to be not working as performant as it should be expected.

Okay, so I did specify sslmode=prefer to make an equal comparison with Go version. Didn’t change anything and was slow.

For the curious I’ve created a Github repository where you can easily run and reproduce the results.

Just a note here: sslmode=prefer doesn’t mean it actually does use it.
Maybe find a way to check if it is active or set the thing to something like sslmode=just-do-it-or-elseOr what ever else makes Go fail if ssl is not active.

…is kind of interesting - seems they override to other modes. based on certain other properties. Unless I’m reading this wrong, is pgx more of a complete rewrite of libpq as opposed to using libpq - This code appears to be emulating what libpq does, but if it’s not actually using libpq I’m not sure it’s an apples-to-apples comparison.

So, any update for this?

I’m not seeing Crystal having lower performance than Golang here. I wrote a program in both Crystal and Go that measures the CPU time spent in communicating with a Postgres DB by fetching 1M rows, each with a UUID, a string, and a timestamptz. I’m seeing Crystal outperform Go.

The DATABASE_URL environment variable I’m using sets sslmode=require, so both the Crystal and Go programs are guaranteed to be using SSL.

Crystal

➜  crystal git:(main) ✗ shards build --release
Dependencies are satisfied
Building: benchmark_pg
➜  crystal git:(main) ✗ time bin/benchmark_pg
00:00:00.552717000
bin/benchmark_pg  0.41s user 0.21s system 24% cpu 2.536 total
➜  crystal git:(main) ✗ time bin/benchmark_pg
00:00:00.474247000
bin/benchmark_pg  0.34s user 0.18s system 23% cpu 2.185 total
➜  crystal git:(main) ✗ time bin/benchmark_pg
00:00:00.604108000
bin/benchmark_pg  0.41s user 0.24s system 24% cpu 2.686 total
➜  crystal git:(main) ✗ time bin/benchmark_pg
00:00:00.551447000
bin/benchmark_pg  0.39s user 0.21s system 29% cpu 2.023 total

Go

➜  go git:(main) ✗ go build benchmark_pg.go
➜  go git:(main) ✗ time ./benchmark_pg
0.665249
./benchmark_pg  0.34s user 0.37s system 31% cpu 2.245 total
➜  go git:(main) ✗ time ./benchmark_pg
0.662801
./benchmark_pg  0.34s user 0.36s system 33% cpu 2.065 total
➜  go git:(main) ✗ time ./benchmark_pg
0.616357
./benchmark_pg  0.33s user 0.33s system 32% cpu 2.028 total
➜  go git:(main) ✗ time ./benchmark_pg
0.527509
./benchmark_pg  0.28s user 0.28s system 25% cpu 2.223 total

The Golang program is using more CPU time than the Crystal program.

Thanks a lot @jgaskins .

Could you please try my repo GitHub - sdogruyol/crystal-vs-go and see if the results are same.

I added some code to your implementation to show the CPU usage during the request — pretty much identical to how I was measuring mine in both Crystal and Go. I ran it against a Postgres DB on DigitalOcean to ensure it was using SSL and these were the results:

Crystal/Kemal

00:00:00.009535000
2026-01-01T21:09:19.650998Z   INFO - kemal: 200 GET /posts 265.01ms
00:00:00.008213000
2026-01-01T21:09:20.378518Z   INFO - kemal: 200 GET /posts 155.46ms
00:00:00.008710000
2026-01-01T21:09:21.188990Z   INFO - kemal: 200 GET /posts 242.85ms
00:00:00.011435000
2026-01-01T21:10:14.273505Z   INFO - kemal: 200 GET /posts 326.59ms
00:00:00.007369000
2026-01-01T21:10:15.943079Z   INFO - kemal: 200 GET /posts 182.49ms
00:00:00.008418000
2026-01-01T21:10:17.378431Z   INFO - kemal: 200 GET /posts 314.8ms
00:00:00.009389000
2026-01-01T21:10:18.399742Z   INFO - kemal: 200 GET /posts 193.64ms
00:00:00.008494000
2026-01-01T21:10:21.047123Z   INFO - kemal: 200 GET /posts 279.58ms

Go

0.010016
[GIN] 2026/01/01 - 16:08:33 | 200 |  227.407667ms |             ::1 | GET      "/posts"
0.012659
[GIN] 2026/01/01 - 16:08:35 | 200 |  260.933333ms |             ::1 | GET      "/posts"
0.011991
[GIN] 2026/01/01 - 16:08:43 | 200 |  351.939333ms |             ::1 | GET      "/posts"
0.015964
[GIN] 2026/01/01 - 16:08:45 | 200 |    323.9145ms |             ::1 | GET      "/posts"
0.012587
[GIN] 2026/01/01 - 16:08:46 | 200 |  346.702833ms |             ::1 | GET      "/posts"
0.014628
[GIN] 2026/01/01 - 16:08:47 | 200 |   275.18275ms |             ::1 | GET      "/posts"
0.011821
[GIN] 2026/01/01 - 16:15:08 | 200 |   367.26025ms |             ::1 | GET      "/posts"

The Go version was using 10-16ms of CPU time per request and the Crystal version was using 7-11ms, so the Crystal implementation was using less CPU time. Ignore the actual request latencies because I’m a few hundred miles from this Postgres database so it’s bound to fluctuate. I didn’t deploy the HTTP servers to the same infrastructure because building containers and deploying them is more work than I was trying to put in here. I mainly wanted to confirm whether crystal-pg was indeed less performant than pgx for Go and I’m seeing the opposite results.

I also verified that the payloads were the same between them:

➜  crystal git:(master) ✗ curl -s localhost:3000/posts | wc -c
  545117
➜  crystal git:(master) ✗ curl -s localhost:8080/posts | wc -c
  545117

The main culprit I can see, which I run into all the time when load-testing locally, is that your readme prescribes running wrk with a concurrency of 100 but your Postgres URL doesn’t specify max_idle_pool_size. I don’t know how pgx handles connection pooling (maybe it uses a high-water mark capped at a maximum size, like ActiveRecord does?), but crystal-db uses a fully elastic connection pool. So when connections are checked back into the pool after running a query, they will be closed and discarded if the number of connections in the pool is >= max_idle_pool_size. Your best bet here is to set max_idle_pool_size=100 (or whatever concurrency you’re expecting) in your DB URL.

My test result:

go:

```
╰──➤ $ 1 wrk -c 100 -d 30 http://localhost:8080/posts
Running 30s test @ http://localhost:8080/posts
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 133.47ms 113.83ms 841.08ms 82.16%
Req/Sec 394.77 65.55 588.00 74.29%
23625 requests in 30.09s, 12.00GB read
Socket errors: connect 0, read 0, write 0, timeout 36
Requests/sec: 785.26
Transfer/sec: 408.40MB
```

Crystal (not built with –release)

╰──➤ $ 130 wrk -c 100 -d 30 http://localhost:3000/posts
Running 30s test @ http://localhost:3000/posts
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.80s 356.23ms 2.00s 91.32%
Req/Sec 21.79 9.85 60.00 65.35%
1263 requests in 30.03s, 661.86MB read
Socket errors: connect 0, read 0, write 0, timeout 549
Requests/sec: 42.06
Transfer/sec: 22.04MB

Crystal (with –release)

╰──➤ $ wrk -c 100 -d 30 http://localhost:3000/posts
Running 30s test @ http://localhost:3000/posts
2 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 395.96ms 79.15ms 1.99s 92.88%
Req/Sec 95.23 22.77 171.00 67.17%
5701 requests in 30.02s, 2.90GB read
Socket errors: connect 0, read 0, write 0, timeout 177
Requests/sec: 189.93
Transfer/sec: 99.08MB

Both crystal have Closed stream (IO::Error) error.

```
2026-01-06T15:52:02.855330Z ERROR - http.server: Unhandled exception on HTTP::Handler
Closed stream (IO::Error)
from /home/zw963/Crystal/share/crystal/src/http/server/response.cr:203:7 in ‘check_headers’
from /home/zw963/Crystal/share/crystal/src/http/server/response.cr:72:7 in ‘content_type=’
from lib/kemal/src/kemal/helpers/templates.cr:25:3 in ‘render_500’
from lib/kemal/src/kemal/exception_handler.cr:28:7 in ‘call’
from /home/zw963/Crystal/share/crystal/src/http/server/handler.cr:30:7 in ‘call_next’
from lib/kemal/src/kemal/head_request_handler.cr:57:7 in ‘call’
from /home/zw963/Crystal/share/crystal/src/http/server/handler.cr:30:7 in ‘call_next’
from /home/zw963/Crystal/share/crystal/src/time.cr:361:5 in ‘call’
from /home/zw963/Crystal/share/crystal/src/http/server/handler.cr:30:7 in ‘call_next’
from lib/kemal/src/kemal/init_handler.cr:15:7 in ‘call’
from /home/zw963/Crystal/share/crystal/src/http/server/request_processor.cr:51:11 in ‘process’
from /home/zw963/Crystal/share/crystal/src/http/server.cr:521:5 in ‘handle_client’
from /home/zw963/Crystal/share/crystal/src/http/server.cr:451:5 in ‘->’
from /home/zw963/Crystal/share/crystal/src/fiber.cr:170:11 in ‘run’
from /home/zw963/Crystal/share/crystal/src/fiber.cr:105:3 in ‘->’
from ???
```

When you’re comparing Crystal and Go, how much CPU is the Crystal process using and how much is the Go process using? On my machine, I see the Go program handling 2x as many requests per second than the Crystal one, but it’s also using over 9x as much CPU to do it.

To be more specific, these were the results I was getting:

Crystal

➜  crystal-vs-go git:(master) wrk http://localhost:3000/posts
Running 10s test @ http://localhost:3000/posts
  2 threads and 10 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    22.42ms    5.13ms  72.77ms   86.12%
    Req/Sec   224.44     22.32   260.00     80.49%
  1837 requests in 4.12s, 0.93GB read
Requests/sec:    446.17
Transfer/sec:    232.02MB

Go

➜  crystal-vs-go git:(master) wrk http://localhost:8080/posts
Running 10s test @ http://localhost:8080/posts
  2 threads and 10 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.16ms    2.18ms  23.97ms   77.73%
    Req/Sec   447.75     24.13   494.00     75.00%
  1250 requests in 1.40s, 650.00MB read
Requests/sec:    891.51
Transfer/sec:    463.59MB

So the Crystal version isn’t “slower”, it’s just using an order of magnitude fewer resources. When I run the Crystal app with -Dpreview_mt to be able to use more resources, it looks more like this:

Crystal with -D preview_mt

➜  crystal-vs-go git:(master) wrk http://localhost:3000/posts
Running 5m test @ http://localhost:3000/posts
  2 threads and 10 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.86ms    2.60ms  55.02ms   77.83%
    Req/Sec   644.32     55.76     1.09k    81.77%
  24646 requests in 19.22s, 12.52GB read
  Socket errors: connect 0, read 0, write 0, timeout 10
Requests/sec:   1282.22
Transfer/sec:    666.79MB

So Crystal can do more (1282 req/s for Crystal vs 891 for Go) while only using half as much CPU (4.5 cores with Crystal vs 9 with Go) and yields better latency on average (8ms with Crystal vs 11ms with Go).

Using hey, we can see the latency distribution even better using percentiles:

Crystal

➜  crystal-vs-go git:(master) hey -n 10000 -c 10 http://localhost:3000/posts

Summary:
  Total:	6.7732 secs
  Slowest:	0.0269 secs
  Fastest:	0.0027 secs
  Average:	0.0067 secs
  Requests/sec:	1476.4066


Response time histogram:
  0.003 [1]	|
  0.005 [2394]	|■■■■■■■■■■■■■■■■■■■■
  0.008 [4762]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.010 [2248]	|■■■■■■■■■■■■■■■■■■■
  0.012 [499]	|■■■■
  0.015 [83]	|■
  0.017 [10]	|
  0.020 [1]	|
  0.022 [0]	|
  0.024 [1]	|
  0.027 [1]	|


Latency distribution:
  10% in 0.0046 secs
  25% in 0.0051 secs
  50% in 0.0065 secs
  75% in 0.0077 secs
  90% in 0.0092 secs
  95% in 0.0101 secs
  99% in 0.0123 secs

Go

➜  crystal-vs-go git:(master) hey -n 10000 -c 10 http://localhost:8080/posts

Summary:
  Total:	9.5831 secs
  Slowest:	0.0205 secs
  Fastest:	0.0054 secs
  Average:	0.0096 secs
  Requests/sec:	1043.5016


Response time histogram:
  0.005 [1]	|
  0.007 [3]	|
  0.008 [402]	|■■
  0.010 [7843]	|■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.011 [1133]	|■■■■■■
  0.013 [287]	|■
  0.014 [210]	|■
  0.016 [96]	|
  0.017 [17]	|
  0.019 [3]	|
  0.020 [5]	|


Latency distribution:
  10% in 0.0087 secs
  25% in 0.0090 secs
  50% in 0.0093 secs
  75% in 0.0097 secs
  90% in 0.0104 secs
  95% in 0.0121 secs
  99% in 0.0147 secs

Crystal does better all throughout the latency distribution at the same concurrency.


Also, when you’re benchmarking requests to localhost, a concurrency of 100 is too high to yield meaningful results. It oversaturates the server beyond what any production workload would look like — you’d have scaled out long before you reached 100 concurrent requests on a single CPU core. You want to use higher concurrency values only when latency is too high to keep the server saturated with wrk’s default settings, but fast requests to localhost rarely need that.

How many cores to Go default to? Crystal uses 4 unless you configure it to use more via CRYSTAL_WORKERS env var. I’d also be curious to see how the new Parallel execution context changes things compared to -Dpreview_mt.

Default is the number of logical CPU cores available, plus IIRC a couple extra for housekeeping (GC, goroutine scheduler, etc).