To be more specific, these were the results I was getting:
Crystal
➜ crystal-vs-go git:(master) wrk http://localhost:3000/posts
Running 10s test @ http://localhost:3000/posts
2 threads and 10 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 22.42ms 5.13ms 72.77ms 86.12%
Req/Sec 224.44 22.32 260.00 80.49%
1837 requests in 4.12s, 0.93GB read
Requests/sec: 446.17
Transfer/sec: 232.02MB
Go
➜ crystal-vs-go git:(master) wrk http://localhost:8080/posts
Running 10s test @ http://localhost:8080/posts
2 threads and 10 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 11.16ms 2.18ms 23.97ms 77.73%
Req/Sec 447.75 24.13 494.00 75.00%
1250 requests in 1.40s, 650.00MB read
Requests/sec: 891.51
Transfer/sec: 463.59MB
So the Crystal version isn’t “slower”, it’s just using an order of magnitude fewer resources. When I run the Crystal app with -Dpreview_mt to be able to use more resources, it looks more like this:
Crystal with -D preview_mt
➜ crystal-vs-go git:(master) wrk http://localhost:3000/posts
Running 5m test @ http://localhost:3000/posts
2 threads and 10 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 7.86ms 2.60ms 55.02ms 77.83%
Req/Sec 644.32 55.76 1.09k 81.77%
24646 requests in 19.22s, 12.52GB read
Socket errors: connect 0, read 0, write 0, timeout 10
Requests/sec: 1282.22
Transfer/sec: 666.79MB
So Crystal can do more (1282 req/s for Crystal vs 891 for Go) while only using half as much CPU (4.5 cores with Crystal vs 9 with Go) and yields better latency on average (8ms with Crystal vs 11ms with Go).
Using hey, we can see the latency distribution even better using percentiles:
Crystal
➜ crystal-vs-go git:(master) hey -n 10000 -c 10 http://localhost:3000/posts
Summary:
Total: 6.7732 secs
Slowest: 0.0269 secs
Fastest: 0.0027 secs
Average: 0.0067 secs
Requests/sec: 1476.4066
Response time histogram:
0.003 [1] |
0.005 [2394] |■■■■■■■■■■■■■■■■■■■■
0.008 [4762] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.010 [2248] |■■■■■■■■■■■■■■■■■■■
0.012 [499] |■■■■
0.015 [83] |■
0.017 [10] |
0.020 [1] |
0.022 [0] |
0.024 [1] |
0.027 [1] |
Latency distribution:
10% in 0.0046 secs
25% in 0.0051 secs
50% in 0.0065 secs
75% in 0.0077 secs
90% in 0.0092 secs
95% in 0.0101 secs
99% in 0.0123 secs
Go
➜ crystal-vs-go git:(master) hey -n 10000 -c 10 http://localhost:8080/posts
Summary:
Total: 9.5831 secs
Slowest: 0.0205 secs
Fastest: 0.0054 secs
Average: 0.0096 secs
Requests/sec: 1043.5016
Response time histogram:
0.005 [1] |
0.007 [3] |
0.008 [402] |■■
0.010 [7843] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
0.011 [1133] |■■■■■■
0.013 [287] |■
0.014 [210] |■
0.016 [96] |
0.017 [17] |
0.019 [3] |
0.020 [5] |
Latency distribution:
10% in 0.0087 secs
25% in 0.0090 secs
50% in 0.0093 secs
75% in 0.0097 secs
90% in 0.0104 secs
95% in 0.0121 secs
99% in 0.0147 secs
Crystal does better all throughout the latency distribution at the same concurrency.
Also, when you’re benchmarking requests to localhost, a concurrency of 100 is too high to yield meaningful results. It oversaturates the server beyond what any production workload would look like — you’d have scaled out long before you reached 100 concurrent requests on a single CPU core. You want to use higher concurrency values only when latency is too high to keep the server saturated with wrk’s default settings, but fast requests to localhost rarely need that.