Writing Crypto in Crystal

Dear crystal community,

I was having a lot of fun implementing RFC 8439 in Crystal. This might not be a very well known RFC. It mainly talkes about the Cipher ChaCha20 the Authenticator Poly1305 and an application in the AEAD mode which allows to create authenticated cipher text with additional data.

Here is the repo link: GitHub - threez/rfc8439.cr: Crystal implementation ChaCha20 stream cipher as well as the Poly1305 authenticator and the AEAD mode defined in rfc8439.

Here a preview of how it is to use it:

require "rfc8439"

# key and nonce are usually given using Bytes,
# but for convinience can be done as a hex string
key = "00:01:02:03:04:05:06:07:08:09:0a:0b:0c:0d:0e:0f:10:11:12:13:14:15:16:17:18:19:1a:1b:1c:1d:1e:1f"
nonce = "00:00:00:09:00:00:00:4a:00:00:00:00"
msg = "Hello World".to_slice

cipher = Crypto::ChaCha20.new(key, nonce)
encrypted = cipher.encrypt(msg)

# encryption is done using XOR so decryption is done
# by encrypting the cypher text
cipher = Crypto::ChaCha20.new(key, nonce)
plaintext = cipher.encrypt(encrypted)

puts plaintext

The reason that I’m writing, besides sharing my fun project is, that I wanted to raise a few questions:

  1. How do you feel about implementing crypto in Crystal in general. Currently we have a heavy depenency on OpenSSL which has its pros and cons, and languages like Go and Java went to implement them natively in the language.
  2. What do you think about my implementation? Is it readable, is it idiomatic?
  3. I was trying to optimize the perfromance of the implmentation and got to encrypt 1 GB in 3 sec on my machine (emulated linux). I have a good intuition on how to optimize code in C/C++ and Go but must admit, that I find it rather difficult to do the same in Crystal. Performance - Crystal doesn’t contain all to many infos that helped my, and I would like to make it a little faster without hurting readability of the code much. I lack the tooling that helps me to identitfy the next steps and maybe didn’t find the right material.

Thanks

2 Likes

My visceral response is don’t first try to write crypto algorithms|apps in native Crystal.

There’s a whole suite of tried|true|tested code to do the various crypto algorithms, usually in C|C++, that you can create wrappers for and use.

Writing Secure Crypto is HARD!

Unless you plan to take the years to test every routine to verify its security, on all kinds of hardware, against all known (including Quantum) attacks, wrap what’s considered to be secure, and provide a Crystal API to them.

But if you want to do it just for fun, then go ahead.

Thank you for your feedback.

I have thoroughly tested my code, ensuring that it meets all the requirements stipulated by the relevant RFC. The RFC, designed to standardize cryptographic protocol implementations, serve as a crucial benchmark for interoperability across various programming languages. Your inquiry seems to suggest that Crystal may not be a suitable language for implementing cryptographic code. Is that your position?

During my studies I took a class on Cryptography at Berkeley University, the complexity of the subject was immediately emphasized. We are in agreement on this point—many technological endeavors, whether it be developing databases like MongoDB or programming languages like Crystal, are inherently difficult. MongoDB, after years of refinement, is now considered production-ready. In contrast, Crystal, while promising, still lacks some critical properties, which I acknowledge despite my affinity for the language. The essence of my argument is that complexity should not deter us; rather, it should motivate us to invest the necessary time and effort.

My focus lies not in inventing new cryptographic algorithms but in implementing established ones, such as ChaCha20 and AES. Post-quantum security is indeed a challenge that pertains to algorithm design rather than implementation. For instance, ChaCha20 can be implemented to function in constant time, mitigating side-channel attacks. However, for Poly1305, I opted for GMP due to convenience, despite its potential vulnerability to such attacks. It’s noteworthy that Poly1305 is not available in my Crystal OpenSSL setup, rendering any wrapping attempts ineffective.

While my endeavor is driven by personal interest, I have confidence in the reliability of my implementation. Though side-channel attacks are theoretically feasible, their practical execution is complex. It’s worth noting that even widely used libraries like OpenSSL are not immune to vulnerabilities, and their extensive codebases can make thorough auditing challenging.

In conclusion, the pursuit of mastery in any field requires perseverance and resilience. The evolution of Crystal serves as a testament to the value of patience and dedication, despite the inherent challenges.

2 Likes

No, no, that’s not what I meant.

I’m absolutely sure you can implement the algorithms in Crystal to be numerically correct (produce correct outputs for given inputs).

But the attack vectors for compiled code is enormous.

And the attacks have gotten so sophisticated, they can discern information from the compiled binaries. So the same source code can lead to vulnerabilities when compiled with different compilers, on different hardware, on different cpus.

So there’re libraries to ensure constant time flow, so different inputs create the same hardware flow times, to temperature information leaks, that produce different hardware heat profiles that can leak information.

Of course, not all of these things can be done under all situations by all bad actors. But the point is writing code to correctly produce crypto algorithms is just a necessary, but not sufficient requirement, if the intent is to achieve high security systems. You still have to have the compiled code pass certain attack vectors too.

I mean even hardware crypto chips|instructions are vulnerable to attacks.

Like I said, if you want to do it to show it can be done, then don’t worry about all this stuff. But if you want Crystal implementations to be considered secure, in the professional cryptographic community sense, these other factors have to be considered for the compiled code’s layout and operation.

2 Likes

Crystal should be just as good as C or Go to implement cryptography algorithms. Yeah, cryptography is hard, but we still need cryptography and people to engineer & implement cryptographically secure algorithms.

So let’s get back on the actual topic: how to optimize yet be idiomatic Crystal.

  1. Go unsafe (when it’s safe):

    Overall, it should mostly be going unsafe as in Rust (when it’s safe to be unsafe): skip overflow checks by using wrapping operators (e.g. &+), skip bound checks by going to raw pointers (e.g. state.to_unsafe[i] instead of state[i]), …

    You could also provide an #unsafe_encrypt method that won’t check that bytes are of block size, that you can then call it in places where you know it’s true (but still call the safe version with a compilation flag, for example for the test suite).

  2. Avoid HEAP allocations, especially repeated ones, when possible. For example when you only need a slice for the duration of a method, instead of Bytes.new(16) you’ll want:

    buf = uninitialized UInt8[16] # => StaticArray(UInt8, 16)
    footer = buf.to_slice         # => Bytes aka Slice(UInt8)
    
  3. Avoid pass by value: don’t pass a StaticArray as a method argument, because it will pass it by value, hence copy it all (even if LLVM inlines it will still copy it), you’ll want to pass a Slice instead.

  4. Avoid repeatedly updating an ivar: copy the ivar as a local var, update the local var in the loop, and eventually save the ivar once after the loop.

I hope these pointers are enough for starters @threez

8 Likes

In terms of idiomatic crystal: You should never have to reach to intrinsics such as memcpy or memmove. Look at #copy_from and #copy_to on Pointer and Slice.

1 Like

Thanks @ysbaddaden,

I followed your recommendations and was able to improve the performance of both algorithms.

The biggest gain for ChaCha20 was the use of to_unsafe to avoid the pointer checks.

The biggest gain for Poly1305 was however to implement a specialized method for the main execution of le_bytes_to_num as le_bytes_to_num17 with a reduced set of BigInt operations. These operations are quite costly and as a result the code is now 5 times faster.

  private def le_bytes_to_num17(buf : Bytes) : BigInt
    acc = IO::ByteFormat::LittleEndian.decode(UInt128, buf)
    EB &+ acc
  end
Before:
                                    user     system      total        real
CHACHA20 Native (1GB)           2.462670   0.000000   2.462670 (  2.462678)
POLY1305 Native (64MB)          5.086101   0.000000   5.086101 (  5.086076)
AEAD_CHACHA20_POLY1305 (64MB)   5.132015   0.009999   5.142014 (  5.142692)

After:
                                    user     system      total        real
CHACHA20 Native (1GB)           2.151268   0.000000   2.151268 (  2.151287)
POLY1305 Native (64MB)          0.834339   0.000000   0.834339 (  0.834348)
AEAD_CHACHA20_POLY1305 (64MB)   1.037798   0.000000   1.037798 (  1.037808)
3 Likes

Nice! We have a hunch that the GMP library is a fantastic library but we could do better performance wise, but implementing big math is a huge task.