Proposal: Support for arbitrary sized integers

I’m sure this is going to be shot down, but it’s something I’ve been thinking about for a while and wanted to get my thoughts down here. Who knows, maybe I’m not as alone in this as I think.

Preface: I know this is an edge case specific to me and probably few others in the Crystal community. Most developers will likely never touch a 128 bit integer, let alone one larger than that. However that’s not to say that larger integers are never needed.

Currently Crystal supports (natively) integers up to 64 bits, with 128 bit support partially done, but still causing issues (when it comes to mathematical operations) and not natively supported by the parser. Crystal also does support arbitrary sized integers via BigInt, which uses GMP underneath.

Now there’s nothing wrong with GMP, but there are areas where it lacks. For one, BigInt has no notion of byte size. Sure, you can get the bit_length and find out how many bytes the number will fit into, but this is useless when it comes to serialization, which is the other issue.

Int subtypes can easily be written to an IO using IO#write_bytes. Part of the reason this is possible is because the bitwidth of the integer is known, as it’s based on the Int sub struct. So we know that an Int32 is going to have 32 bits, or 4 bytes and can easily represent the number as those 4 bytes. When it comes to a BigInt this is not the case.

Yes, it is still possible to fudge things. You can look at the bit_length property of the BigInt, find the closest byte size that aligns, and write the BigInt as if it were an integer of that size. This works, if the integer your BigInt is holding happens to be the right size. But say your BigInt isn’t holding the right amount of bits to align to the number of bytes you need. You might need a 256 bit integer, or 32 bytes of data, but end up with only 16 bytes of data.

So my proposal: LLVM natively supports integers with a bitwidth from 1 all the way to 16,777,215. That is some serious wiggle room (over 2MB). It would be amazing if Crystal were to use this capability and expose it to the user.

My idea is to modify Int to be a generic Int(N), where N is the bytesize. This is similar to the way StaticArray does things, so it’s not unprecedented. All Int subtypes could then subclass Int(N), so Int32 would be defined as struct Int32 < Int(32). More than anything this would be to maintain compatibility with existing code.

When it comes to LLVM-IR this kind of stuff is easy to handle (from my limited experience). For example, to add two Int256:

define i256 @add(i256 %a, i256 %b) {
  %1 = add i256 %a, %b
  ret i256 %1
}

All of this would have an added benefit when it comes to packed structs, as arbitrary sized integers could be used as bitfields (see here).

Thoughts? Comments? Concerns?
I would be more than happy to work on this myself (though it’s been a while since I’ve played with the compiler so it could take a while).

I would be interested in use cases.

If it works then please do. I thought Int128 would work out of the box but then you need to depend on llvm runtime and even then I think there are issues. I would be surprised if bigger width ints worked out of the box

Well in my case the use case is for an MTProto implementation. MTProto is a message passing protocol developed for Telegram. MTProto itself doesn’t care which types you pass back and forth, so long as they can be serialized into a stream of bytes, but Telegram’s API defines multiple types which rely on Int128 and Int256.

For the time being I’m probably going to have to come up with my own wrapper types for BigInt which provide the necessary methods to (de)serialize Int128 and Int256 types, but it would be nice if I didn’t have to take such a hacky approach.

Another potential use case is for block chains like ethereum, which I know sometimes rely on even larger integers (up to 4096 bit I believe).

As mentioned this would also allow us to define bitfields for packed structs since you could represent a single bit.

I can reach out to Andrew, the creator of Zig. He implemented this in Zig a while back so he might have some tips.

I would think arbitrary sized int and float classes would help pull more interest from use cases in fields like higher Math, Science, Astronomy, Physics.

2 Likes