Crystal vs Go: Notes and Reflections

jzakiya · February 25, 2021, 9:13pm

I recently (Feb 14, 2021) had a video session with @lbarasti to ask him to look at my Crystal implementation of my twinsprimes sieve and help me make it “better”. We found an issue with using channels that I don’t think is documented, and a few others.

In the process he mused if Go (golang.org), which has a similar concurrency model (vs true parallelism) to Crystal would behave similarly. So after the session, I took the following week to (finally) do a Go version (single and multi threaded) to compare against Crystal.

What I present here are my notes, reflections, and musing on the technical differences
between the two languages, but also other things, like how using them makes me feel, and ways to improve Crystal, relative to Go and other languages, so it can attract more users.

My system:
System76 Gazelle laptop (2016), i7-6700HQ 3.5GHz, 4C|8T, 16GB mem
PCLinuxOS (PCLOS), Linux 5.10.17, KDE desktop
Latest versions (ATOW) of Crystal (0.36.1) and Go (1.16)

Here are gists for the code.

multi-threaded versions

Go

gist.github.com

https://gist.github.com/jzakiya/fbc77b8fdd12b0581a0ff7c2476373d9

twinprimes_ssoz.go

// This Go source file is a multiple threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of twin primes <= N, or in range N1 to N2; the last
// twin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

Crystal

gist.github.com

https://gist.github.com/jzakiya/2b65b609f091dcbb6f792f16c63a8ac4

twinprimes_ssoz.cr

# This Crystal source file is a multiple threaded implementation to perform an
# extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.

# Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
# Output is the number of twin primes <= N, or in range N1 to N2; the last
# twin prime value for the range; and the total time of execution.

# This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
# 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
# probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

single-threaded versions

Go

gist.github.com

https://gist.github.com/jzakiya/066b44ec99f5d8fc208521e11870aefe

twinprimes_ssoz_ser.go

// This Go source file is a single threaded implementation to perform an
// extremely fast Segmented Sieve of Zakiya (SSoZ) to find Twin Primes <= N.

// Inputs are single values N, or ranges N1 and N2, of 64-bits, 0 -- 2^64 - 1.
// Output is the number of twin primes <= N, or in range N1 to N2; the last
// twin prime value for the range; and the total time of execution.

// This code was developed on a System76 laptop with an Intel I7 6700HQ cpu,
// 2.6-3.5 GHz clock, with 8 threads, and 16GB of memory. Parameter tuning
// probably needed to optimize for other hardware systems (ARM, PowerPC, etc).

This file has been truncated. show original

Crystal
Twinprimes generator, single-threaded, using SSoZ (Segmented Sieve of Zakiya), written in Crystal · GitHub.

1 Crystal is faster than Go for both versions

For both the single-threaded and multi-threaded code versions Crystal is faster.
It wasn’t by a “whole lot”, but in all significant cases it was always clearly faster.

2 Crystal uses less runtime memory than Go

An original issue with the Crystal multi-threaded version was memory use kept increasing as the number of threads used increased (memory wasn’t being released after threads ended). htop was used to observe this behavior.

Below is the original Crystal code that performed the concurrent processing.

cnts = Array(UInt64).new(pairscnt, 0)      # the number of twinprimes found per thread
lastwins = Array(UInt64).new(pairscnt, 0)  # the largest twinprime val for each thread
done = Channel(Nil).new()

restwins.each_with_index do |r_hi, i|      # sieve twinprimes restracks
  spawn do
    lastwins[i], cnts[i] = twins_sieve(r_hi,kmin,kmax,kb,start_num,end_num,modpg,primes,resinvrs)
    print "\r#{i + 1} of #{pairscnt} twinprimes done"
    done.send(nil)
end end
pairscnt.times { done.receive }            # wait for all threads to finish

Lorenzo figured out that this line was the culprit: done = Channel(Nil).new()
He changed it to: done = Channel(Nil).new(pairscnt)
so that each thread had its own channel to communicate it was done.
Apparently, originally all the threads|fibers were backing up waiting to use one available channel.

Once changed, the memory use dropped to a consistent small amount for the duration of operation.

For Go, memory use increased to a constant level of about 2 GBs more than Crystal, for an input of 1 trillion (1_000_000_000_000). This was seen to be a consistent characteristic of Go for all significant input values. Go will hit some high max memory use, then stay there.

3 Crystal’s executables are smaller than Go’s (for both un|stripped)

For the compilation semantics provide in each code version the executable sizes are:

single-threaded (un|stripped): Crystal - 942,976; 467,352; Go - 2,128,039; 1,490,072
multi-threaded (un|stripped): Crystal - 973,264; 496,040; Go - 2,133,275; 1,494,200

So Go (which promotes itself as having no runtime dependencies) has larger execs because it carries a larger runtime environment, allowing its execs to be run independently on OSs. This is one reason Go is used by apps such as Caddy server (https://caddyserver.com/) and duf (Check Your Disk Usage Using 'duf' Terminal Tool in Linux). (This is a similar selling point of Rust.)

4 Compilation Speed

For almost identical source code sizes, the Go compiler is almost instantaneous, while
Crystal chugged along and took seconds (I didn’t bother to measure the differences).

For people who use Crystal, “slow” compilation speed issues are not new.
However, for people coming from Go, et al, compiler performance might be very discouraging.

5 Semantic differences

I was pleasantly relieved that coding in Go was nice, after learning how Go does things.
For instance, Go doesn’t have while loops, using instead variants of its for loops.

In fact, its less wordy to do:

for i, elem := range arry { }
vs
arry.each_with_index do |elem, i| .. end

Go for loops also automatically give you the index and elem, and you can choose to use just one.

for _, elem := range arry { }
and
for i, _ := range arry { }
or
for i := range arry { }

6 Go’s strengths and weaknesses

Go was designed within Google starting in 2007, announced in 2009, and went 1.0 on 2012/03/28. It was designed to be statically typed, easy to use, and focused on processing lots of events concurrently.

https://golang.design/history/

Go, however, is not inherently designed for numerical heavy processing, as it’s missing many common methods|functions for doing some basic common math|numerical operations.

For my purposes (implementing fast numerical algorithms) it lacks true parallelism and standard numerical methods to perform many types of math heavy algorithms.

However, if you want to do app servers, database interfacing, I/O heavy stuff, Go works well.

7 Documentation and coding examples

One thing that really frustrated me with Go was finding coding examples to do simple things, like how to read in two numbers from the command line, creating dynamic runtime arrays, and how to convert|use between number types.

I had to do hours of online searching to find usable code examples to answer these questions, because Go’s documentation (that I found) had no simple and clear examples for these use cases.

It’s clear, Go’s documentation style is (pedantically) geared to software developers and not the casual, or newbie, user. And they are not the only “sinners” that do this.

It has been stated over and over, that easy to use and find documentation on how to use a project can make or break it. And creating good documentation is a skill, like being a front-end developer, and shouldn’t be left up to project developers, and if necessary, projects should pay people with the skills to produce their documentation.

OK, rant over.

8 Reflections

After doing all this, I’m even more impressed on how good Crystal is for its age (2011).
The devs deserve a lot of credit|recognition for its design and implementation.

But here are some things I think must happen to make Crystal more known and accepted by programmers/users.

a) Crystal needs to implement true, easy to use, parallelism.

From my experience implementing my twinprimes sieve, so far in D, Go, Nim, Rust, and Crystal, Rust is by far the fastest, and performs it with true parallelism, as also D and Nim. To play in the same space as these languages Crystal needs a true, and easy to use, parallelism implementation.

And what I mean by easy to use, Crystal has to eliminate the need to use CRYSTAL_WORKERS to run concurrent|parallel code as now required, e.g.

$ CRYSTAL_WORKERS=8 ./twinprims_ssoz 1000000 2000000

None of the other languages have this type of requirement, as it puts too much of a burden on users to have to manually do this all the time, instead of having the language do what every other language does, and use all the threads available on the system. If a user needs to limit the number of threads to use, they should be able to set that in the code, or some other easy and simple method.

If Crystal can create a true simple to code parallel threading model, Crystal could be a major player in math heavy fields, like AI, data analytics, and machine learning, because it’s already performant at math.

b) Crystal needs at least one killer app, or field of specialty.

I (and others) see that Crystal needs a killer app to make its name known to the general public. It needs its equivalent of Rails to Ruby. (Can Amber, Kemal, etc, be it?)

Go has killer apps like Caddy server, et al, and is recognized in the field|use case for concurrent processing. Crystal is good at allot of things, but is known to the general public for nothing.

One thing I’ve seen from this exercise is that Crystal can be a player in the concurrent processing space too, maybe even be better than Go, with more development, and a concerted effort to demonstrate and publicize its capabilities.

Conclusion

Crystal right now is very good, especially for some use cases, but it can be excellent for a wider set if it does these things I suggest. I’ll be really interested to see what happens once Crystal hits 1.0.

Jabari Zakiya

straight-shoota · February 25, 2021, 9:27pm

Are the crystal binaries statically linked or dynamically? For statically linked the same remarks about runtime dependencies would be true about Crystal.

jzakiya · February 25, 2021, 9:58pm

I compiled Crystal with the compilation statement shown in the source code.
I don’t think my system allows for static compilation, but if it’s possible its not documented how to do it (I thought you needed to use musl to do that).

jzakiya · March 3, 2021, 4:35am

Tonight I did some research into using WireGuard VPN.

One commenter suggested using Tailscale - https://tailscale.com/ - because their code is open source, and it turns out it’s written in Go.

After looking at the code I immediately thought this is something that could be done well in Crystal too, but was never considered for use by that project.

As WireGuard is an open protocol, and will likely become the most widely used VPN, this is the type of use case that a Crystal implementation could become a ''killer app" for it, and showcase its use to the general public.

Topic		Replies	Views
[Help] Optimization and code review Learning Resources	13	492	August 15, 2021
There is a way to optimize this program? Help & Support	61	1164	November 22, 2024
Kudos to Crystal! News	12	2163	December 29, 2021
Multithreaded Crystal initial thoughts	38	7473	February 21, 2024
Timeline for multithreading support Crystal Contrib	31	2289	September 24, 2024

Related topics