@ysbaddaden started toying with BLAKE3, trying to assess how much faster it was compared to SHA256 in Crystal, and see if it would bring some free improvement…
What started as a simple benchmark ended up in a discovery of a performance issue related to the initialization of Crystal structs, which stems from Crystal's Ruby-based syntax.
The Crystal compiler will always copy the structs, yes. It passes structs by value. LLVM optimizes the code that Crystal generates and might optimize the pass-by-value to a pass-by-reference if and only if it has an optimization for that calling pattern.
It sounds like Julien was saying that if the ivar is a reference (he mentions Pointer, but I don’t know if that means specifically instances of Pointer(T) or any Reference type) then LLVM has a great optimization for that calling pattern — the entire struct is inlined to the point where there’s no difference in the generated code between using the struct and not using it. It’s 100% free at runtime.
This aligns with my experience with simple structs where even my most micro of microbenchmarks showed no performance difference between using the wrapper struct and performing the same operations on the wrapped class instance.
During codegen there are no more differences between Reference and Pointer. For LLVM it’s a mere pointer.
What I meant is that when we wrap a pointer-sized value in a struct (nothing more) then LLVM codegen will optimize the struct away, and the generated assembly will be identical to passing the pointer/reference directly. The struct becomes a nice abstract with zero cost.
But as soon as we pass something else (e.g. wrap the value in a StaticArray), the generated assembly will start to be different, even if the struct would still be pointer sized; because of alignment, I think, it must copy each internal value. At that point, the struct might not be a zero cost abstraction anymore.
Now, unless the struct is significantly large, the cost shall be be hardly noticeable.