Question about Crystal, Compiled Code, and Performance

nsuchy · May 9, 2019, 4:16pm

It’s my understanding that Crystal code is compiled with LLVM into native machine code. If that’s the case why is code written in C and compiled in GCC faster than Crystal code compiled with LLVM? If they’re both native machine code shouldn’t performance be the same?

asterite · May 9, 2019, 4:46pm

What are the codes that you are comparing?

nsuchy · May 9, 2019, 4:59pm

Just benchmarks from the book Programming Crystal, C always outperformed Golang and Crystal in every example.

asterite · May 9, 2019, 5:21pm

I guess we’d have to see the codes, but in general go and Crystal allocate more memory than C, and C is older than Crystal and Go so they might have more optimized algorithms.

sol.vin · May 10, 2019, 2:49am

LLVM has some overhead that C doesn’t have, plain and simple. I would highly recommend you check out kostya/benchmarks, look at the memory used by Crystal, it’s a lot more than C in some cases. That’s a huge potential bottleneck since the computer has to obviously allocate, write, and read that memory to do things. There are cases where Crystal barely underperforms compared to C, in Matmul’s case, a mere 0.06 seconds difference.

At the end of the day, LLVM turned into compiled ASM is NOT the same as C to compiled ASM. If you were to take a C program, compile it to LLVM, and then compile it to machine code, you’ll probably see similar speed issues, but of course, the LLVM emitted by Crystal is going to be fundamentally different than LLVM emitted by C code, so it’s a little hard IMO to directly compare the two like they are supposed to be the same. Crystal is going to have applications where is meets C speeds/memory profile, or just barely under performs it, and there is even times where C might have a larger memory profile than Crystal.

It all depends on the optimizations and methodologies used by either language.

Although take what I say with a grain of salt, I’m no master of LLVM.

EDIT: made a mistake, meant to say compiled to native machine code instead of C code

asterite · May 10, 2019, 3:04am

I’m not sure that’s entirely correct. LLVM can compile C (this is clang) and it generates code that matches, and sometimes outperforms, the performance of gcc.

The thing here is that Crystal has a GC, arrays are a reference to a pointer (while in C they are just pointers), in C everything’s a struct and in many cases you don’t allocate heap memory for them, in C strings can be mutable so you can reuse memory, etc.

sol.vin · May 10, 2019, 4:38am

I know about clang, what I meant was that you can’t write a piece of code in Crystal and emit LLVM and expect it will be exactly the same as an identical piece of LLVM emitted C code. If I’m not mistaken, the two piece of code will have some big differences, like GC, std classes, etc.

For example, I wrote two programs, one in Crystal, and one in C, both of them do about the same thing, print the string "HELLO!\n"

test.cr

print "HELLO!\n"

test.c

#include <stdio.h>

int main()
{
    printf("HELLO!\n");
    return 0;
}

I then made each emit LLVM-IR using the following commands

crystal-vs-c$ crystal build test.cr --emit llvm-ir -o test-cr.ll
crystal-vs-c$ clang -emit-llvm -c test.c -S -o test-c.ll

The difference between the two is night and day, the C code ends up being a succinct 25 lines of LLVM-IR where the Crystal code comes out to around 70000 lines of LLVM-IR. I tried emitting LLVM without the prelude but, it won’t work since it can’t find print without it.

gist.github.com

https://gist.github.com/redcodefinal/4aee655c9a6ccbe0669b6cbecf352cf6

test-c.ll

; ModuleID = 'test.c'
source_filename = "test.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

@.str = private unnamed_addr constant [8 x i8] c"HELLO!\0A\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define i32 @main() #0 {
  %1 = alloca i32, align 4

This file has been truncated. show original

test-cr.ll

; ModuleID = 'main_module'
source_filename = "main_module"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%String = type { i32, i32, i32, i8 }
%"Slice(UInt8)" = type { i32, i1, i8* }
%"Array(String)" = type { i32, i32, i32, %String** }
%"->" = type { i8*, i8* }
%"Thread::LinkedList(Fiber)" = type { %Fiber*, %Fiber*, %"Thread::Mutex"* }

This file has been truncated. show original

GC/reference overhead are both good points though, I didn’t think of that.

Again not a master of LLVM so maybe I’m missing something else but, I would love to learn more about LLVM internals from someone who works with it all the time.

straight-shoota · May 10, 2019, 8:44am

The overhead in this example comes mostly from Crystal’s stdlib runtime, not LLVM.

The following Crystal code resembles the C implementation more closely:

require "lib_c"
require "c/stdio"

LibC.printf pointerof("HELLO!\n".@c)

Compiled with --prelude=empty --no-debug this emits 33 lines LLVM IR and works pretty similar as the C example with only minimal overhead.

Obviously, you wouldn’t want to write a larger program like this. And for writing any serious application, you will need some kind of libraries and enhancements to a minimal runtime. Crystal’s stdlib just provides a lot of features you’ll probably need anyway, right from the start.

sol.vin · May 10, 2019, 4:45pm

Whoa, didn’t know you could require C’s stdio like that in Crystal!

straight-shoota · May 10, 2019, 9:59pm

Yeah, with just the compiler, no stdlib you can essentially write C code with Crystal syntax.

nsuchy · May 23, 2019, 3:15pm

could we reduce the amount of the stdlib data that gets sent to LLVM? Would that be a long term goal? For example if we load JSON - github.com/crystal-lang/crystal of the stdlib, we might not also need to load Random - github.com/crystal-lang/crystal if this isn’t being considered already, could we track this as a long term goal to minimize CPU Cycles and RAM use by a Crystal Program?

straight-shoota · May 24, 2019, 9:08am

I’m not sure what you mean. The compiler already skips code that is never called. If your program doesn’t use random, it won’t be compiled into the executable.

nsuchy · May 24, 2019, 4:52pm

Was referring to this, is the entire standard library not included? If so then already resolved.

straight-shoota · May 27, 2019, 10:12am

The entire stdlib is available by default, but only the parts that are actually used will be compiled into the binary.

Topic		Replies	Views
Why is Crystal language faster than Ruby language? Help & Support	5	674	August 24, 2021
Crystal and LLVM Help & Support	5	385	September 6, 2019
Baseline content of a compiled executable? Help & Support	4	733	December 20, 2019
Very slow build speeds for hello world Help & Support	42	923	August 18, 2024
Using Rust inside a Crystal program Help & Support	22	3679	April 17, 2025

Question about Crystal, Compiled Code, and Performance

Related topics