OpenSSL::Digest store and resume

Hi all

I’m trying to implement a “resumable” SHA256 digest which can be persisted and later resumed when new data arrives, the use case is a distributed uploader where the data arrives in chunks.

For this, I’m trying to store a LibCrypto::EVP_MD_CTX_Struct somewhere (disk, redis) and then later resuming with a new Digest::SHA256 instance (or an extension of it as the initializer with a ctx is protected).

Any pointers for this? Is this even possible? (I failed so far..)

You must keep the Digest::SHA256 instance around and push data as it arrives, and eventually “final” it to compute the actual digest. For example:

digest = Digest::SHA256.new

while data = receive_data?
  append_to_file(data)
  digest << data
end

digest.hexfinal

Thanks for the reply.

I would like to avoid keeping the instance of Digest::SHA256 around in memory but instead persist it to some storage to allow for a “stateless” server which can resume the calculation from a persisted intermediate digest.

In go-lang for example, this is possible by using h.(encoding.BinaryMarshaler) and later a h.(encoding.BinaryUnmarshaler) (where h is the digest object).

I don’t think this is possible. EVP_MD_CTX is an opaque pointer in OpenSSL.

I’m not even sure why our bindings define a data layout for the struct type; it looks very wrong for modern OpenSSL. That doesn’t really matter in practice because we only deal with pointers anyway.

The relevant part for your problem is that the internal definition of the EVP_MD_CTX struct is very complex and contains lots of pointers into the OpenSSL engine. It’s not meant to be serialized out of memory.

I’m assuming you’re referring to https://pkg.go.dev/crypto/sha256 which is a native implementation of the sha256 digest in Go. This means the internal data structures are directly known at compile time instead of hidden behind the OpenSSL lib interface. And the context is much simpler because it specializes on this single algorithm, whereas OpenSSL’s EVP_MD_CTX is a general context for many different algorithms.

With a native, purpose-specific implementation of sha256 digest in Crystal, serialization would be more feasible.
But with the OpenSSL backend it’s not.

The only options I can think of is to either digest each chunk individually, or the whole data at the end.

Thanks for this update - much appreciated!