[Help] Optimization and code review

Hello,

First of all, what an amazing language! I’ve been writing Crystal for a few days now and it’s a pure joy.

I’m looking to replace Go for Crystal as my primary driver, so I wrote two implementations of a simple HTTP server in both and benchmarked them.

Currently, Go is winning by a slight margin. Crystal is running through WSL (I’m on Windows), uncertain if there is a penalty there and if so, how big.

I don’t know idiomatic Crystal, and have never written Ruby. Any inputs here?

Cheers! :slight_smile:

3 Likes

First question: How did you build the test programs? Did you use release mode?
For such benchmarks it’s always a good idea to have build instructions.

I didn’t look at the implementation in depth but on a first look, you can definitely safe on some hash lookups by caching @hash[ht.from] in a local variable.

How did you build the test programs? Did you use release mode?

Yes, it’s written in the header of the test results, but I didn’t write it anywhere else. I’ll all some build instructions, cheers!

you can definitely safe on some hash lookups by caching @hash[ht.from] in a local variable.

Store the sub-hash you mean? “Real use” would imply different [key1][key2], so I’d say that such an improvement would break that use case.

… I should perhaps add some use cases to clarify just exactly what I’m supposed to be testing.

Thank you for your answer :slight_smile:

Oh, I somehow overlooked that. Sorry.

I don’t see how that would break. During the execution of insert(ht : HText), the value of ht.from does not change. And I understand the value of @ht[ht.from] should also be expected to not change. That’s not protected though, so with concurrent execution, I believe the current code could already run into conflicts.
In any case, you should really be able to just store the result of the first @ht[ht.from]? call in a local variable and replace all further calls with that variable.

In any case, you should really be able to just store the result of the first @ht[ht.from]? call in a local variable and replace all further calls with that variable.

Oh now I get you. The newborn isn’t letting me any more sleep than a few hours, let’s blame it on that :joy:

I thought you wanted me to make a TechEmpower-style optimization and cache the first request so that any subsequent request would be returned immediately, effectively gaming the benchmark.

Updated the repo with your suggestions, thank you.

Go output became ~131 kb/s faster, Crystal output became ~61 kb/s slower? I bumped up the total amount of requests to 25k, and ran it several times on each server just to be sure.

This is a toy bench, but one of the things that attracted me to Crystal was the “as fast as C” part. I’d love to learn whichever optimizations that would move me towards that goal.

Is it perhaps the lack of parallelism? Go schedules all goroutines across all cores, IIRC Crystal uses single core concurrency at the moment, perhaps this is why?

EDIT:

That’s not protected though, so with concurrent execution, I believe the current code could already run into conflicts.

Perhaps in later on I’ll add a mutex to both implementations, it’s how you’d do it in Go at least (without breaking the API).

What happens when you also pass -k to ab? That’s for keep-alive.

Many times the OS limits how many sockets or things can be created by a process.

Go: 21523.25 kb/s total.
Crystal: 11872.58 kb/s total.

Go is nearly twice as fast in this case.

Is it due to the lack of threaded concurrency in Crystal? Webserver implementation?

I’ve seen some benchmarks with Crystal in various articles, videos etc and it sure does seem to be very close to C, where G is close to C. Not in this particular case though.

My guess it’s WSL. You will have to find someone to run the benchmark on Linux. On Mac osx I get Crystal is faster.

That makes sense, cheers.

If I might ask, what numbers are you getting (with/without keep alive)? My only goal is to be on par with Go, anything above is a bonus.

Sure!

With keep alive:

Go:

ab -k -c 10 -n 25000 -p test.json -T application/x-www-form-urlencoded http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 10000 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 20000 requests
Completed 22500 requests
Completed 25000 requests
Finished 25000 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   0.354 seconds
Complete requests:      25000
Failed requests:        0
Keep-Alive requests:    25000
Total transferred:      2475000 bytes
Total body sent:        5725000
HTML transferred:       0 bytes
Requests per second:    70556.09 [#/sec] (mean)
Time per request:       0.142 [ms] (mean)
Time per request:       0.014 [ms] (mean, across all concurrent requests)
Transfer rate:          6821.34 [Kbytes/sec] received
                        15778.66 kb/s sent
                        22600.00 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       1
Waiting:        0    0   0.1      0       1
Total:          0    0   0.1      0       1

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      1 (longest request)

Crystal:

ab -k -c 10 -n 25000 -p test.json -T application/x-www-form-urlencoded http://127.0.0.1:8080/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)
Completed 2500 requests
Completed 5000 requests
Completed 7500 requests
Completed 10000 requests
Completed 12500 requests
Completed 15000 requests
Completed 17500 requests
Completed 20000 requests
Completed 22500 requests
Completed 25000 requests
Finished 25000 requests


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8080

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   0.278 seconds
Complete requests:      25000
Failed requests:        0
Keep-Alive requests:    25000
Total transferred:      1550000 bytes
Total body sent:        5725000
HTML transferred:       0 bytes
Requests per second:    89785.31 [#/sec] (mean)
Time per request:       0.111 [ms] (mean)
Time per request:       0.011 [ms] (mean, across all concurrent requests)
Transfer rate:          5436.22 [Kbytes/sec] received
                        20078.94 kb/s sent
                        25515.16 kb/s total

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    0   0.1      0       1
Waiting:        0    0   0.1      0       1
Total:          0    0   0.1      0       1

Percentage of the requests served within a certain time (ms)
  50%      0
  66%      0
  75%      0
  80%      0
  90%      0
  95%      0
  98%      0
  99%      0
 100%      1 (longest request)

Actually, it seems Go is faster because of the transfer rate. Maybe Go returns more headers than Crystal (I think there’s a Date header)

To be honest, I don’t know what should I look at in these benchmarks. Requests per second? Transfer rate?

In any case, Go is built by Google and Crystal is an open source project, so it’s expected that Go will be faster in a lot of cases.

Oh, and I can’t test without -k because Mac OSX limits the number of sockets a process can open, so ab just hangs at one point.

To be honest, I don’t know what should I look at in these benchmarks. Requests per second? Transfer rate?

Nothing specific, I have no real use case, this is just a common benchmark and getting-started-project I use when learning new languages :slight_smile:

In any case, Go is built by Google and Crystal is an open source project, so it’s expected that Go will be faster in a lot of cases.

Oh absolutely. It’s also built for specifically to run servers at Google, so anything but fast would be surprising.

Crystal is an amazing language, I’m really enjoying it. Happy to see that it’s holding it’s candle! :smiley:

2 Likes

At least in some cases, requests per second is important; in which this particular benchmarks show Crystal (89785.31) ahead of Go (70556.09). :slight_smile:

1 Like

Most likely. Like many Google products, Go chews up as many system resources as it can to mitigate latency or increase throughput. I would recommend running both of these implementations through the Unix time command (assuming it’s available on WSL — I unfortunately don’t have experience with it) to see how much CPU time it’s using to get that kind of performance. From there you can calculate how much CPU time is required per unit of work you’re performing.

In every comparison I’ve done on CPU time used between Go and Crystal, Crystal used less — sometimes as much as 60% less. It’s pretty wild.

1 Like