I’m sure this is going to be shot down, but it’s something I’ve been thinking about for a while and wanted to get my thoughts down here. Who knows, maybe I’m not as alone in this as I think.
Preface: I know this is an edge case specific to me and probably few others in the Crystal community. Most developers will likely never touch a 128 bit integer, let alone one larger than that. However that’s not to say that larger integers are never needed.
Currently Crystal supports (natively) integers up to 64 bits, with 128 bit support partially done, but still causing issues (when it comes to mathematical operations) and not natively supported by the parser. Crystal also does support arbitrary sized integers via BigInt, which uses GMP underneath.
Now there’s nothing wrong with GMP, but there are areas where it lacks. For one, BigInt has no notion of byte size. Sure, you can get the bit_length
and find out how many bytes the number will fit into, but this is useless when it comes to serialization, which is the other issue.
Int
subtypes can easily be written to an IO using IO#write_bytes
. Part of the reason this is possible is because the bitwidth of the integer is known, as it’s based on the Int
sub struct. So we know that an Int32
is going to have 32 bits, or 4 bytes and can easily represent the number as those 4 bytes. When it comes to a BigInt this is not the case.
Yes, it is still possible to fudge things. You can look at the bit_length
property of the BigInt, find the closest byte size that aligns, and write the BigInt as if it were an integer of that size. This works, if the integer your BigInt is holding happens to be the right size. But say your BigInt isn’t holding the right amount of bits to align to the number of bytes you need. You might need a 256 bit integer, or 32 bytes of data, but end up with only 16 bytes of data.
So my proposal: LLVM natively supports integers with a bitwidth from 1 all the way to 16,777,215. That is some serious wiggle room (over 2MB). It would be amazing if Crystal were to use this capability and expose it to the user.
My idea is to modify Int
to be a generic Int(N)
, where N
is the bytesize. This is similar to the way StaticArray
does things, so it’s not unprecedented. All Int subtypes could then subclass Int(N)
, so Int32
would be defined as struct Int32 < Int(32)
. More than anything this would be to maintain compatibility with existing code.
When it comes to LLVM-IR this kind of stuff is easy to handle (from my limited experience). For example, to add two Int256
:
define i256 @add(i256 %a, i256 %b) {
%1 = add i256 %a, %b
ret i256 %1
}
All of this would have an added benefit when it comes to packed structs, as arbitrary sized integers could be used as bitfields (see here).
Thoughts? Comments? Concerns?
I would be more than happy to work on this myself (though it’s been a while since I’ve played with the compiler so it could take a while).