**Proposal**
I'm proposing adding the operation `*%` to be used as `(x *% y) …% m`.
**Purpose**
Here, `(x *% y)` will be internally performed as a double-precision multiplication
to temporarily use twice the number of bytes for a given `Int` size to hold the
multiplication result before being reduced modulo `m` for its size.
This will allow all `Init::Primitive` type modulo math to be done accurately,
which now doesn't occur if the `(x * y)` multiplication overflows (i.e. doesn't
fit within the size of `m`).
**Rationale**
Crystal added the `&` operand to be used as `&*` and `&**` to prevent certain overflows.
However when doing modular-muliplications `(x &* y) % m` the results will still
silently overflow if the resultant multiplication value is greater than the `x|y` types,
causing an incorrect result, while producing no runtime error or warning.
To prevent this, the `(x * y)` operation must be done as a double-precision multiplication,
whose result is temporarily stored as twice the size of the largest `x|y` int type.
Then the modulo reduction is done to `m`'s type size the resultant value will be accurate.
Currently if one performs the following code, `(b &* b)` will run and silently overflow if
the result is greater than `b`s int size.
```
def powmod(b, e, m)
r, b = 1, b % m
while e > 0
r = (r &* b) % m if e.odd?
e >>= 1
b = (b &* b) % m
end
r
end
```
To prevent this `(b *% b) % m` would first perform a double-precision multiply, then the
modulo reduction, to preserve the full accuracy of the lost precision.
**Use Cases**
Using math for a given int type is faster than using `BigInt`s.
Thus, it's always better to prevent using `BigInt`s as much as possible if highest
performance is desired.
The following example shows where this proposal would make a great difference.
This code uses the `GMP` lib function to perform a `powmod`.
```
require "big"
@[Link("gmp")]
lib LibGMP
fun mpz_powm = __gmpz_powm(rop : MPZ*, base : MPZ*, exp : MPZ*, mod : MPZ*)
end
def powmodgmp(b, e, m)
y = BigInt.new
LibGMP.mpz_powm(y, b.to_big_i, e.to_big_i, m.to_big_i)
y
end
```
This works with any size int type, by first converting them all to `BigInt`s
to use in `GMP` function.
However when the inputs are not `BigInts` the Crystal `powmod` code is faster,
but because of the cited issues, will not produce accurate results when the
values for `b` cause overflows. To get the best of both worlds we can create
a hybrid `powmod` function.
```
require "big"
@[Link("gmp")]
lib LibGMP
fun mpz_powm = __gmpz_powm(rop : MPZ*, base : MPZ*, exp : MPZ*, mod : MPZ*)
end
def powmodgmp(b, e, m)
y = BigInt.new
LibGMP.mpz_powm(y, b.to_big_i, e.to_big_i, m.to_big_i)
y
end
def powmodint(b, e, m)
r = typeof(m).new(1)
b = b % m; b = typeof(m).new(b)
while e > 0
r = (r &* b) % m if e.odd?
e >>= 1
b = (b &* b) % m
end
r
end
def powmodopt(b, e, m)
if m > UInt64::MAX
r = powmodgmp(b, e, m)
else
m = m.to_u64
r = powmodint(b, e, m)
end
end
```
In the desired code the `&*` operations would be replace by `*%` (or equivalent).
Here `powmodint` works when the value of `m` is not (less than) a `BigInt`,
no matter the type of `b`.
In `powmodopt`, if the value of `m` is greater than the largest int type it's
a `BigInt`, so we use the `GMP` version.
Otherwise we use `powmodint`, which first reduces the value of `b` to fit in `m`
and then assigns its type as `m`s. Now `b` and `m` values are of the same type.
Thus doing the subsequent math is the fastest possible.
(Because currently, even though a `UInt128` is the largest int type, you can't
do `BigInt.to_(i|u)128`, so the code uses `if m > UInt64::MAX` to check the value of `m`.
See https://github.com/crystal-lang/crystal/issues/12771)
Below are some benchmarks showing the performance for different value cases.
Using the `GMP` version always produces accurate results.
The accuracy using `powmodopt` varies based on the overflow issue but performance still stands.
```
require "benchmark"
def powmodtests(b, e, m)
Benchmark.ips do |x|
x.report("powmodopt") { powmodopt(b, e, m) }
x.report("powmodgmp") { powmodgmp(b, e, m) }
puts
end
end
def tests(b, e, m)
puts "r = #{r = powmodopt(b, e, m)} of class #{r.class} using powmodopt"
puts "r = #{r = powmodgmp(b, e, m)} of class #{r.class} using powmodgmp"
powmodtests(b, e, m)
puts
end
puts
b = "329832983".to_big_i
e = 4843
m = "498422".to_big_i
tests b, e, m
b = "329832983".to_big_i
e = 4843
m = 498422.to_i64
tests b, e, m
b = "329832983".to_big_i
e = 4843
m = 498422.to_u64
tests b, e, m
b = "329832983".to_big_i
e = "4433923223".to_big_i
m = "498422".to_big_i
tests b, e, m
b = "329832983".to_big_i
e = 4433923223
m = 498422.to_i64
tests b, e, m
b = "329832983".to_big_i
e = 4433923223u64
m = 498422.to_u64
tests b, e, m
b = "3298329804304383423".to_big_i
e = 4433923224384234
m = "2372095723823498422".to_big_i
tests b, e, m
b = "3298329804304383423".to_big_i
e = 4433923224384234
m = 2372095723823498422.to_i64
tests b, e, m
b = "3298329804304383423".to_big_i
e = 4433923224384234
m = 2372095723823498422.to_u64
tests b, e, m
```
And here are the performance and accuracy results.
Except in one case (for some reason) `powmodopt` is generally `3x` faster.
Because these types of operations are generally used inside loops for many algorithms
you can see the potential overall algorithmic performance increase with this feature.
```
r = 107323 of class UInt64 using powmodopt
r = 107323 of class BigInt using powmodgmp
powmodopt 12.13M ( 82.43ns) (± 0.30%) 16.0B/op fastest
powmodgmp 3.88M (257.88ns) (± 0.20%) 32.0B/op 3.13× slower
r = 107323 of class UInt64 using powmodopt
r = 107323 of class BigInt using powmodgmp
powmodopt 14.60M ( 68.47ns) (± 0.21%) 16.0B/op fastest
powmodgmp 3.73M (268.13ns) (± 0.46%) 48.0B/op 3.92× slower
r = 107323 of class UInt64 using powmodopt
r = 107323 of class BigInt using powmodgmp
powmodopt 14.59M ( 68.53ns) (± 0.28%) 16.0B/op fastest
powmodgmp 3.71M (269.64ns) (± 0.41%) 48.0B/op 3.93× slower
r = 31565 of class UInt64 using powmodopt
r = 31565 of class BigInt using powmodgmp
powmodopt 1.23M (811.82ns) (± 0.26%) 1.05kB/op 1.86× slower
powmodgmp 2.29M (435.90ns) (± 0.25%) 16.0B/op fastest
r = 31565 of class UInt64 using powmodopt
r = 31565 of class BigInt using powmodgmp
powmodopt 7.22M (138.57ns) (± 0.31%) 16.0B/op fastest
powmodgmp 2.20M (454.80ns) (± 0.77%) 48.0B/op 3.28× slower
r = 31565 of class UInt64 using powmodopt
r = 31565 of class BigInt using powmodgmp
powmodopt 7.31M (136.86ns) (± 0.13%) 16.0B/op fastest
powmodgmp 2.19M (456.24ns) (± 0.19%) 48.0B/op 3.33× slower
r = 288018135440070645 of class UInt64 using powmodopt
r = 1898779714053406193 of class BigInt using powmodgmp
powmodopt 4.73M (211.52ns) (± 0.08%) 16.0B/op fastest
powmodgmp 1.71M (585.48ns) (± 0.17%) 32.0B/op 2.77× slower
r = 288018135440070645 of class UInt64 using powmodopt
r = 1898779714053406193 of class BigInt using powmodgmp
powmodopt 5.03M (198.83ns) (± 0.08%) 16.0B/op fastest
powmodgmp 1.68M (596.07ns) (± 0.12%) 48.0B/op 3.00× slower
r = 288018135440070645 of class UInt64 using powmodopt
r = 1898779714053406193 of class BigInt using powmodgmp
powmodopt 5.07M (197.16ns) (± 0.09%) 16.0B/op fastest
powmodgmp 1.67M (599.49ns) (± 0.16%) 48.0B/op 3.04× slower
```