Integer sqrt in stdlib

jzakiya · March 18, 2020, 9:45pm

Floating point numbers inherently can never represent 100% most integers/rationals.

Performing Math.sqrt(n).to_i begins to fail significantly as n becomes large.
I pointed this out to Ruby, which included the method Integer.sqrt in Ruby 2.5.

Including it will also make Crystal more accurate/useful for doing large numerical computations.

Ruby used the Newton method for sqrts, which I included in my gem roots.

Below is an example of a tested/verified Crystal implementation.
This version return value is the same class as its input value.

module Sqrt
  def sqrt
    n = self
    raise "can't take squareroot of negative integer" if n < 0
    return n if n < 2
    bits = n.to_s(2).size
    one = typeof(n).new(1)  # value 1 of type self
    x = one << ((bits - 1) >> 1) | n >> ((bits >> 1) + 1)
    while (t = n // x) < x; x = (x + t) >> 1 end
    x   # output is same integer class as input
  end
end

struct Int; include Sqrt end

# Use as this:

require "big"

puts "27378181.sqrt = #{x = 27378181.sqrt}, #{x.class}"
puts "1038654027378181.sqrt = #{x = 1038654027378181_u64.sqrt}, #{x.class}"
puts "671038654027378181.sqrt = #{x = 671038654027378181_i64.sqrt}, #{x.class}"
puts "(10**17).sqrt = #{x = (10_i64 ** 17).sqrt}, #{x.class}"
puts "(11**31).sqrt = #{x = (11.to_big_i ** 31).sqrt}, #{x.class}"
puts "(12**111).sqrt = #{x = (12.to_big_i ** 111).sqrt}, #{x.class}"

Produces this output:

27378181.sqrt = 5232, Int32
1038654027378181.sqrt = 32228155, UInt64
671038654027378181.sqrt = 819169490, Int64
(10**17).sqrt = 316227766, Int64
(11**31).sqrt = 13854364834150661, BigInt
(12**111).sqrt = 784438960743806544996211780136486395318587740656004078924591, BigInt

However it’s implemented/named its inclusion will make Crystal computationally more useful/accurate.

jzakiya · March 19, 2020, 2:55am

Ruby had a Math.sqrt(n) method which returned floating point values for all inputs n, so:
Math.sqrt(10) => 3.1622776601683795

It thus chose to include Integer.sqrt(n) so: Integer.sqrt(10) => 3

to eliminate any ambiguity for doing: 10.sqrt => 3|3.126.. (?)

However if up to me, I would create something like n.isqrt for integer values,
and let n.sqrt produce floats for all number types n. (You could even make isqrt
treat all its inputs as integers, like sqrt treats all its inputs as floats, sort of like ‘/’ vs ‘//’.)

It’s shorter, and looks more intuitive, natural, and iconic.

But these are implementation details. I can live with whatever is determined best.

straight-shoota · March 19, 2020, 3:18pm

What exact use case does this have, especially what benefit over Math.sqrt for Big* numbers?

jzakiya · March 20, 2020, 4:10am

Using this test code below we can see the loss of accuracy that
occurs for Math.sqrt(n) as n becomes larger.

The first line uses the previous given method for Integer sqrt,
compared to Math.sqrt(n) as a float, and converted to various integer types.

e = 19
puts "(10**#{e}).sqrt = #{x = (10.to_big_i ** e).sqrt}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_big_i}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e)}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_i}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_u32}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_i64}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_u64}, #{x.class}"

Here sqrt exceeds I32 range

(10**19).sqrt = 3162277660, BigInt
(10**19).sqrt = 3162277660, BigInt
(10**19).sqrt = 3162277660.168379332, BigFloat
(10**19).sqrt = -1132689636, Int32
(10**19).sqrt = 3162277660, UInt32
(10**19).sqrt = 3162277660, Int64
(10**19).sqrt = 3162277660, UInt64

(10**20).sqrt = 10000000000, BigInt
(10**20).sqrt = 10000000000, BigInt
(10**20).sqrt = 10000000000.0, BigFloat
(10**20).sqrt = 1410065408, Int32
(10**20).sqrt = 1410065408, UInt32
(10**20).sqrt = 10000000000, Int64
(10**20).sqrt = 10000000000, UInt64

(10**30).sqrt = 1000000000000000, BigInt
(10**30).sqrt = 1000000000000000, BigInt
(10**30).sqrt = 1000000000000000.0, BigFloat
(10**30).sqrt = -1530494976, Int32
(10**30).sqrt = 2764472320, UInt32
(10**30).sqrt = 1000000000000000, Int64
(10**30).sqrt = 1000000000000000, UInt64

Here the sqrt excedes 64-bits range

(10**40).sqrt = 100000000000000000000, BigInt
(10**40).sqrt = 100000000000000000000, BigInt
(10**40).sqrt = 100000000000000000000.0, BigFloat
(10**40).sqrt = 1661992960, Int32
(10**40).sqrt = 1661992960, UInt32
(10**40).sqrt = 7766279631452241920, Int64
(10**40).sqrt = 7766279631452241920, UInt64

This is the limit for accuracy for Math.sqrt(n) as a float

(10**76).sqrt = 100000000000000000000000000000000000000, BigInt
(10**76).sqrt = 100000000000000000000000000000000000000, BigInt
(10**76).sqrt = 100000000000000000000000000000000000000.0, BigFloat
(10**76).sqrt = 0, Int32
(10**76).sqrt = 0, UInt32
(10**76).sqrt = 687399551400673280, Int64
(10**76).sqrt = 687399551400673280, UInt64

Next exponment value of 77 creates errors in Math.sqrt(n) for float values

(10**77).sqrt = 316227766016837933199889354443271853371, BigInt
(10**77).sqrt = 316227766016837933199889354443271853371, BigInt
(10**77).sqrt = 316227766016837933200000000000000000000.0, BigFloat
(10**77).sqrt = 1467897147, Int32
(10**77).sqrt = 1467897147, UInt32
(10**77).sqrt = 4387618993402172731, Int64
(10**77).sqrt = 4387618993402172731, UInt64

Now conversion of Math.sqrt(n).to_big_i produces incorrect result too

(10**78).sqrt = 1000000000000000000000000000000000000000, BigInt
(10**78).sqrt = 999999999999999999993126004485993267200, BigInt
(10**78).sqrt = 999999999999999999993000000000000000000.0, BigFloat
(10**78).sqrt = 0, Int32
(10**78).sqrt = 0, UInt32
(10**78).sqrt = 0, Int64
(10**78).sqrt = 0, UInt64

Even worse now for Math.sqrt(n)

(10**99).sqrt = 31622776601683793319988935444327185337195551393252, BigInt
(10**99).sqrt = 31622776601683793319988935444319318001775823290368, BigInt
(10**99).sqrt = 31622776601683793320000000000000000000000000000000.0, BigFloat
(10**99).sqrt = 0, Int32
(10**99).sqrt = 0, UInt32
(10**99).sqrt = 0, Int64
(10**99).sqrt = 0, UInt64

Sometimes now Math.sqrt(n) will produce correct values for even epxponents

(10**100).sqrt = 100000000000000000000000000000000000000000000000000, BigInt
(10**100).sqrt = 99999999999999999999999999999986929427981463977984, BigInt
(10**100).sqrt = 100000000000000000000000000000000000000000000000000.0, BigFloat
(10**100).sqrt = 0, Int32
(10**100).sqrt = 0, UInt32
(10**100).sqrt = 0, Int64
(10**100).sqrt = 0, UInt64

But inaccurate results for odd large exponents.

(10**101).sqrt = 316227766016837933199889354443271853371955513932521, BigInt
(10**101).sqrt = 316227766016837933199889354443266966994053071110144, BigInt
(10**101).sqrt = 316227766016837933200000000000000000000000000000000.0, BigFloat
(10**101).sqrt = 0, Int32
(10**101).sqrt = 0, UInt32
(10**101).sqrt = 0, Int64
(10**101).sqrt = 0, UInt64

Thus, Math.sqrt(n) becomes inaccurate as a float, and all integer type conversions,
after a certain size, making it unusable for whole classes of numerical algorithms
applied to large numbers (cryptography, number theory, combinatorics, prime number
theory, factorization, eliptic curves, etc, etc.).

Crystal is excellent for doing numerical processing. This is a standard issue having to
do with the limits of how floating point values can represent numbers past a certain
point, which is why Ruby (et al languages) have at least an Integer#sqrt equivalent.

asterite · March 20, 2020, 12:55pm

I think for numerical purposes, and accuracy, this is really nice.

Could you send a PR?

Also, I’m sure there’s a faster way to compute n.to_s(2).size, like, not needing to create a string and then compute it’s size (maybe continuously dividing by two?).

ComputerMage · March 20, 2020, 2:09pm

For n.to_s(2).size it is easier to count the number of 0 before first 1 is set and then extract it from the bit size of the containing type. IT will be much faster. It can be done as value & (1 << nbits) == 0 in the loop while decreasing the nbits from type size down to 0.

fxn · March 20, 2020, 2:28pm

I was thinking that Crystal uses GMP already, and GMP has related functions. Maybe just bind some?

jzakiya · March 20, 2020, 11:05pm

I used: n.to_s(2).size to find the bit length because it was
the most reliable that worked for all number sizes for n.

Compile the following code as:

crystal run bitslengthtest.cr -Ddisable_overflow --release

def bit_length(n : BigInt); Math.log2(n).ceil.to_i end

puts
puts "for: def bit_length(n : BigInt); Math.log2(n).ceil.to_i"
puts
exp = 308; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 309; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 310; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 3010; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"

It produces this output:

for: def bit_length(n : BigInt); Math.log2(n).ceil.to_i end

for n = (10**308), n.to_s(2).size = 1024, bit_length(n) = 1024, Math.log2(n) = 1023.1538532253076

for n = (10**309), n.to_s(2).size = 1027, bit_length(n) = -2147483648, Math.log2(n) = Infinity

for n = (10**310), n.to_s(2).size = 1030, bit_length(n) = -2147483648, Math.log2(n) = Infinity

for n = (10**3010), n.to_s(2).size = 10000, bit_length(n) = -2147483648, Math.log2(n) = Infinity

Again, any operation using floats will beome inaccurate past a certain (large) size for n.

Ruby has method bit_length.

(10**308).bit_length  => 1024 
(10**309).bit_length  => 1027 
(10**310).bit_length  => 1030 
(10**3010).bit_length => 10000

See its implemenation in https://github.com/ruby/ruby/blob/master/numeric.c starting at line 4894.

Since any representable integer value has a fixed binary representation its bit_length can be directly assessed from it.

Actually, the same issue arises for taking the nth root of any integer.
The sqrt of the Newton method is a specific case for any Integer nth roots.
A little bit more code can provide for users a huge amount of functionality not in other languages.

require "big"

module IntRoots

  def isqrt(n = self)   # Newton's method for integer sqrt
    raise "ivalid negative integer" if n < 0
    return n if n < 2
    b = n.to_s(2).size
    one = typeof(self).new(1)  # value 1 of type self
    x = one << ((b-1) >> 1) | n >> ((b >> 1) + 1)
    while (t = n // x) < x; x = (x + t) >> 1 end
    x
  end

  def irootn(n)        # Newton's method for integer nth root
    return nil if self < 0 && n.even?
    raise "root n not an Integer >= 2" unless n.is_a?(Int) && n > 1
    return self if self == 0 || (self == -1 && n.odd?)
    num = self.abs
    one = typeof(self).new(1)  # value 1 of type self
    e, x = n-1, one << (num.to_s(2).size-1) // n + 1
    while (t = (e * x + num // x ** e) // n) < x
      x = t 
    end
    x *= self < 0 ? -1 : 1
  end
end

struct Int; include IntRoots end

(10.to_big_i ** 517).iroot(13) => 5878016072274912830666108305036555474358

I don’t know if GMP library provides these methods.
But here, all we need is BigInt and no other dependencies.

I’ll create an issues topic in github to discuss all the details around this.

fxn · March 21, 2020, 9:52am

See the implementation of BigInt.

jzakiya · March 21, 2020, 2:17pm

BigInt only has Math.sqrt, (lines 696-708), which returns a BigFloat, with all the problems shown.

fxn · March 21, 2020, 2:51pm

Let me see if I can make my suggestion more clear.

Crystal implements BigInt using GMP, and GMP has functions to compute integer roots. Therefore, perhaps you could delegate the implementation to GMP, instead of rolling your own.

asterite · March 21, 2020, 3:50pm

I added bit_length to Int in this PR: https://github.com/crystal-lang/crystal/pull/8924

With that and using a GMP function we could implement this in an optimal way.

straight-shoota · March 22, 2020, 5:41pm

I’m doubting this would be much useful for non-big numbers. When operating which large ints where this matters, you’d easily be approaching the domain limits of fixed-size integer types. If you want safe arithmetics for large numbers, you need to use Big* anyway.
So implementing this only for Big* using the already availble gmp functions sounds great. If we only provided it for Big*, we should perhaps call the method BigInt.sqrt.

jzakiya · March 22, 2020, 6:24pm

The PR for bit_length creates incorrect results for large numbers as it uses Math.log2(n).to_i.

module BitLength
  def bit_length()
    x = self < 0 ? ~self : self

    if x.is_a?(Int::Primitive)
      self.class.new(sizeof(self) * 8 - x.leading_zeros_count)
    else
      self.class.new(Math.log2(x).to_i + 1)
    end
  end
end

struct Int; include BitLength end

# tests
puts "(10 ** 29).bit_length = #{(10.to_big_i ** 29).bit_length}"     # should be 97
puts "(10 ** 30).bit_length = #{(10.to_big_i ** 30).bit_length}"     # should be 100
puts "(10 ** 31).bit_length = #{(10.to_big_i ** 31).bit_length}"     # should be 103
puts "(10 ** 100).bit_length = #{(10.to_big_i ** 100).bit_length}"   # should be 333
puts "(10 ** 308).bit_length = #{(10.to_big_i ** 308).bit_length}"   # should be 1024
puts "(10 ** 309).bit_length = #{(10.to_big_i ** 309).bit_length.to_big_i}"   # should be 1027
puts "(10 ** 3010).bit_length = #{(10.to_big_i ** 3010).bit_length.to_big_i}" # should be 10000

However, the results are:

(10 ** 29).bit_length = 97
(10 ** 30).bit_length = 100
(10 ** 31).bit_length = 103
(10 ** 100).bit_length = 333
(10 ** 308).bit_length = 1024
(10 ** 309).bit_length = -2147483647
(10 ** 3010).bit_length = -2147483647

Again Math.log2(n) will always produce incorrect results for numbers past a certain size.
Does GMP provide an intrinsic method to perform bit_length?
I couldn’t find it, which is why I ended up using n.to_s(2).size as it seems to always work correctly.

Need to run spec against very large numbers, and check against results from n.to_s(2).size.

jzakiya · March 22, 2020, 6:40pm

For many (most) numerical algorithms you don’t know the dynamic range of your inputs for your data set. However, you always want your roots to be the same type as its value, to maintain data type continuity.

Your suggestion would mean the user would have to know the data type for each members of a data set, and then use the appropriate method. This is untenable, and unnecessary.

The suggested methods can work on all Integer types and return the same data type.

fxn · March 22, 2020, 6:46pm

However, you do not need it, right?

The nth root of an integer is strictly smaller than the receiver except for a couple of edge cases. You stay in the type. So one option could be to make it polymorphic: isqrt could be computed in a simple way for primitive ints (if possible), and call GMP for big ints.

It’s been a long time since I finished my degree, but I recall some number theoretic algorithms naturally used integer square roots. It definitely has use cases depending on your domain.

asterite · March 22, 2020, 6:51pm

Good point. I’ll change it to use to_s(2).size

jzakiya · March 22, 2020, 7:03pm

Ruby use the GMP library too, so it may be instructive to dig into the dirty details of what/how they actually implement bit_length to handle all the cases dynamically.

asterite · March 22, 2020, 8:01pm

I couldn’t find that function in GMP, and I believe Ruby doesn’t use GMP for all the operations.

rogerdpack · March 23, 2020, 4:25am

It seems weird to have two methods named the same thing that produce different results. Maybe it could be sqrt and sqrt_big or something? Just a thought…

Topic		Replies	Views
Integer sqrt not in std lib yet Crystal Contrib	1	306	July 18, 2021
Ruby -> Crystal: n.to_s[0].to_i Help & Support	11	741	March 12, 2020
Hack to convert [Big]Ints to any Int type Help & Support	0	211	May 8, 2023
BigInt powmod(b, e, m) or equivalent	1	458	March 3, 2019
Proposal: Support for arbitrary sized integers Crystal Contrib	5	574	February 25, 2021

Integer sqrt in stdlib

Related topics