Hello,
yesterday I used Crystal for the first time. I wanted to see how it compares to Ruby in a simple but important task. It’s about processing and querying data. As a temporary step, I wanted to read hashes from the database, fix the format (real numbers stored as text), order them by the values, write it back. I also wanted to write back an ordered Array of the keys as the JSON keys aren’t ordered.
The JSON strings (Hashes) have tens of thousand key-value pairs.
Ruby version:
require 'pg'
require 'json'
base = PG.connect('host=localhost dbname=xxx')
rows = base.query('select * from temp order by user_id limit 1000')
puts rows.class
start = Time.now
rows.each do |row|
results_hash = JSON.parse row['percentages']
results = results_hash.map { |key, value| [key.to_i, value.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
$ time ./reorder.rb
PG::Result
{362369=>90.36, 74229
{163453=>50.71, 24970
...
{910276=>86.16, 53256
{395316=>84.63, 92819
{932416=>85.71, 78133
57.163174
real 1m1.122s
Ruby version with converting the rows into an Array from the PGResult. Only one line differs:
...
rows = base.query('select * from temp order by user_id limit 1000').to_a
...
time ./a_reorder.rb
Array
{362369=>90.36, 74229
{163453=>50.71, 24970
...
{910276=>86.16, 53256
{395316=>84.63, 92819
{932416=>85.71, 78133
56.1434
real 1m0.557s
Crystal version, using PG::ResultSet:
require "db"
require "pg"
module Crystal
DB.open "postgresql://localhost/xxx" do |db|
db.query("select * from temp order by user_id limit 1000") do |rows|
puts rows.class
start = Time.now
rows.each do
user_id, percentages = rows.read Int32, JSON::Any
results = percentages.as_h.map { |key, value| [key.to_i, value.to_s.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
end
end
end
$ time ./crystal
PG::ResultSet
{362369 => 90.36, 742
{163453 => 50.71, 249
...
{910276 => 86.16, 532
{395316 => 84.63, 928
{932416 => 85.71, 264
00:00:29.386695000
real 0m29.469s
Crystal version, using Array:
require "db"
require "pg"
module Crystal
DB.open "postgresql://localhost/xxx" do |db|
rows = db.query_all "select * from temp order by user_id limit 1000", as: {Int32, JSON::Any}
puts rows.class
start = Time.now
rows.each do |row|
results = row[1].as_h.map { |key, value| [key.to_i, value.to_s.to_f] }.sort_by { |x| x[1] }.reverse.to_h
puts results.inspect[0..20]
end
puts Time.now - start
end
end
$ time ./rows
Array(Tuple(Int32, JSON::Any))
{362369 => 90.36, 742
{163453 => 50.71, 249
...
{910276 => 86.16, 532
{395316 => 84.63, 928
{932416 => 85.71, 264
00:00:21.955791000
real 0m53.327s
Note that I wrote the Ruby version with sort { |x, y| y[1] <=> x[1] } instead of sort_by {}.reverse but the <=> operator didn’t work in Crystal, and I spent already enough time to make the database connection work. Hence I used sort_by {}.reverse in both languages to be fair.
There were details that surprised me. Other details didn’t.
I knew that Ruby had an excellent PostgreSQL driver. Years ago, when Rust was about v1.0, I compared Ruby vs. Rust vs. Go in raw query performance. Ruby was the fastest due to the cached prepared query. Go was the slowest.
It surprised me that converting the rows to an Array in Ruby took almost no time, and it ran for the same time as the PG::Result.each version.
It surprised me even more how slow was the the query-the-rows-as-Array version in Crystal.
You can see that both Ruby versions spent about 4 seconds on querying, fetching, and processing the result from the database. The rest of the time was spent on converting the strings to integers and floats, ordering it, and making it a Hash again.
Crystal was different. The JSON to Hash, convert strings to integers and floats, ordering, and making it a Hash again part took only 22 seconds. However, the query_all was so slow that in the end, that Crystal version was near the same speed as Ruby.
If I consider that the conversion was about 21 seconds, then the query-fetch part using db.query took only 7-8 seconds.
Conclusion:
In this test, the querying and fetching in Ruby was twice as fast (4s vs. 8s) compared to Crystal’s db.query and rows.each. Even a bit faster as the 4s includes the Ruby starting overhead. However, it’s normal. Ruby also has beaten Rust and Go in this area (many years ago when I tested it). The developer of the Rust pg driver wrote to me that the Ruby pg was excellent C code.
I wonder why the db.query_all and rows.each do |row| version in Crystal was about ten times slower than Ruby, as well as about five times slower than the other Crystal version.
As for the typecasts. I checked the Crystal driver’s JSON part. It also calls JSON.parse on the field but in Crystal, there will be an additional typecast, I believe. It’s the JSON::Any to Hash in my code. (The Ruby JSON.parse returns a Hash). Still, these casts were 2.7 times faster (if I counted right) in Crystal. JSON::Any didn’t support #map, unfortunately. In the end, I Crystal was twice as fast as Ruby, if I used the db.query version.
I’m not comparing the performance of the two languages. It was my first Crystal code, only a tiny selection of the features. Besides, most of these Ruby methods are in C.
However, it wasn’t a synthetic test either. I was comparing solutions. I have to process this data for real. The original table has a few billion records. Crystal will help.
Cheers.