Json WebService Benchmark

I would take the TechEmpower benchmarks with a grain of salt. Even their most realistic benchmarks are pretty far from real-world code.

As with all benchmarks, running your own code is the best indicator of how something will perform for your use case rather than trusting contrived benchmarks. If Crystal is the highest performer by that large a margin and performance is one of the top criteria, then it sounds like Crystal is the best choice for this service.

1 Like

You could try profiling each app to see where the bottlenecks are.

1 Like

Wow, spider-gazelle is really fast, and the docs look very complete too. Plus the last commit was two days ago!

3 Likes

thanks for your valuable inputs & suggestions
techempower framework benchmark (TFB) has very complete benchmark data
there are things that i found so far, regarding my simple benchmark :

  1. actix based web service implementation
  • it’s very possibly that i havent found the correct way to get the maximum performance in using actix
  • so, i think i’ll consult with rust users for the correct best practice in using actix
  • and/or adding a go-lang framework for comparison, as go-lang seems easier to learn than rust
  1. multi core cpu
  • TFB uses Intel Xeon Gold 5120 CPU : 14 Cores, 28 threads
  • what i’ve tested so far is using vps 1 core, so multi-core cpu should also be used to get additional data

thanks

1 Like

Techempower is running an old version of spider-gazelle - I need to update it
This benchmark is more up to date

Please do! Router performance is interesting, but Techempower’s benchmarks are more real-world and useful.

Sort of, I mean I could switch to Redis and win all the benchmarks.
Kemel and Raze are not using ORMs whereas Amber and SG are.
So it’s kind of an apples to oranges comparison and properly misleading to the casual observer.

There are different types of Techempower’s benchmarks. Some of them are meant to test specifically the typical use of different web frameworks (Classification: Fullstack). E.g. you would use say ECR templates in framework x but plain methods in framework y. There will be many other differences and such benchmarks would measure the default (recommended) way of coding in particular framework.

For example why the Rails performs much worse? Part of it is Ruby of course, but it also the fact that Rails has a long middleware stack doing lots of things. If you recreate such stack doing all the same things in Crystal the Crystal benchmarks can be slowed down. Or you can remove some default handlers from Rails middleware list and improve Ruby results. But that’s probably not what you’ll do in real life. Most likely you’ll chose some framework and follow it’s way.

If you want to beat the benchmarks then Redis is not an option as it is single threaded.
Look at Aerospike (aerospike.com) as it is truly multithreaded and have much better features than Redis.

I had big issues with Redis on one of my big projects (I recommended Aerospike but CTO was stuck with Redis, even though I did explain where and how Redis will break)

1 Like

thanks for your valuable inputs,

here i just want to share some additional info, that might be useful

regarding actix, i’ve asked for suggestions in rust forum, here :

AFAIK,
it’s very possibly for a newbie in rust, to follow a simple example in tutorials in internet,
to write an implementation of web service using actix, like in this article (that can be easily found from the result of googling over these words : “rest api with actix web postgresql”) : https://turreta.com/2019/09/21/rest-api-with-rust-actix-web-and-postgresql-part-1/

but if we read more on actix, which is basically async (which is different from other rust web frameworks which is sync : nickel, iron, rocket, etc.), then we can find that the example in that article is sync or blocking.
in another word, it’s the wrong way of using actix, as the performance will be very low.

one of the best practice in using actix with database access is written in the official doc/example :

blocking parts of code must be put inside web::block(move || { ... })
one of an async implementation that i’ve tried, is more than 10 times faster than the sync version.

regarding the multi core cpu, i’ve made the same benchmark in 4 core cpu (Xeon CPU E3-1225 v5 @ 3.30GHz).
when using 4 concurrent client connections, crystal based frameworks are still relatively fast.
but when the concurrency gets increased, the results are different.
and in 64 concurrent connections, actix diesel is faster :

the OS and crystal version which are used :

$ rpm -qa centos*
centos-release-6-10.el6.centos.12.3.x86_64
$ crystal -v
Crystal 0.30.1 [5e6a1b672] (2019-08-12)

LLVM: 4.0.0
Default target: x86_64-unknown-linux-gnu

Out of curiosity, what is the CPU usage of the various processes in this benchmark? Is it possible that Crystal is serving more requests per second per CPU core consumed and that building with -Dpreview_mt would improve its throughput?

ok, thanks,
i’ve tried “-Dpreview_mt” options, and made some tests on another machine/box (8 core i7)

the results (ab test numbers) are copy-pasted in here : https://github.com/sharing-lab/ws-benchmark/blob/ca8c792acdb36569425a9024f427685bb27441f9/results/i7_8_core-max100-localhost.txt

the source code is here : https://github.com/sharing-lab/ws-benchmark/tree/ca8c792acdb36569425a9024f427685bb27441f9

implementations which are tested in the benchmark :

  • kemal : compiled without -Dpreview_mt
  • kemal-mt-x : with -Dpreview_mt, and using env.variable CRYSTAL_WORKERS=x

the chart :

detailed server environments :

$ cat /proc/cpuinfo |tail -n 30|head -n 10
cache_alignment	: 64
address sizes	: 39 bits physical, 48 bits virtual
power management:

processor	: 7
vendor_id	: GenuineIntel
cpu family	: 6
model		: 60
model name	: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
stepping	: 3

$ rpm -qa centos*|grep release-7
centos-release-7-7.1908.0.el7.centos.x86_64

$ crystal -v
Crystal 0.33.0 [612825a53] (2020-02-14)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

$ shards install
Fetching https://github.com/kemalcr/kemal.git
Fetching https://github.com/luislavena/radix.git
Fetching https://github.com/jeromegn/kilt.git
Fetching https://github.com/crystal-loot/exception_page.git
Fetching https://github.com/will/crystal-pg.git
Fetching https://github.com/crystal-lang/crystal-db.git
Installing kemal (0.26.1)
Installing radix (0.3.9)
Installing kilt (0.4.0)
Installing exception_page (0.1.2)
Installing pg (0.20.0)
Installing db (0.8.0)

$ crystal build --release -Dpreview_mt  -o bin/ws-rel-pmt src/ws.cr
3 Likes

I’ve seen Crystal outperform Go on a lot of things, but seeing it outperform Rust is fantastic :star_struck:

4 Likes

yes, about go, i’ve tried some go web framework yesterday (gorilla, fasthttp-router,…), to get some more comparisons.

for rust, to develop web service in rust, it’s very common to use ORM (mostly used: diesel).
so in my previous tests, i used diesel as the db connection layer.
today, i try not to use ORM, i use tokio-postgres (an async library to access postgress)

source code and results (in results directory) is here : https://github.com/sharing-lab/ws-benchmark/tree/26168797257b0ff773dbe21fb77afcde65ab1b9c

in short, here is the result using ab -n 1000 -c 4 http://127.0.0.1/555XX/color :

Nice! Crystal is doing pretty well.

Could you try doing the same. but with wrk instead of ab? Someone told me in the past that wrk is much more reliable and gives more consistent results.

@fat some months ago we created and used Benchy: A benchmark tool . You might find it useful to play around different benchmarks. Let me know if you play with it and if there is any feedback.

2 Likes

thanks for your suggestions,
yes, i agree that sometime we need some scripts to help us automate the benchmark tasks.
i’ll take a look at the docs first.

Wonder if somebody should make a go “fasthttp” equivalent for crystal…though I admit I don’t know much about it and “acing microbenchmarks” is often unuseful in real life LOL.

ok, today i’ve made another test, using wrk
but i’ve only done that in a vps box, 1 core cpu,
maybe later / tomorrow i’ll make a test on i7 box (8 core)

so, here’s the source code, and results on a vps box, 1 core cpu:

wrk results copy-pasted in this file : https://github.com/sharing-lab/ws-benchmark/blob/33324a0b23c539f5141f3a7abc14f859678da5d6/results/vps_1_core.txt

screenshot from the plotted chart & table :

thanks.

2 Likes

and this is the result with wrk, on another box: 4 core cpu (yesterday, it was only on a single core cpu box)
results & source code : https://github.com/sharing-lab/ws-benchmark/tree/5fb78bc110c45a657c81369e95286de61a20c549

thanks,

Fatih