Integer sqrt in stdlib

Floating point numbers inherently can never represent 100% most integers/rationals.

Performing Math.sqrt(n).to_i begins to fail significantly as n becomes large.
I pointed this out to Ruby, which included the method Integer.sqrt in Ruby 2.5.

Including it will also make Crystal more accurate/useful for doing large numerical computations.

Ruby used the Newton method for sqrts, which I included in my gem roots.

Below is an example of a tested/verified Crystal implementation.
This version return value is the same class as its input value.

module Sqrt
  def sqrt
    n = self
    raise "can't take squareroot of negative integer" if n < 0
    return n if n < 2
    bits = n.to_s(2).size
    one = typeof(n).new(1)  # value 1 of type self
    x = one << ((bits - 1) >> 1) | n >> ((bits >> 1) + 1)
    while (t = n // x) < x; x = (x + t) >> 1 end
    x   # output is same integer class as input
  end
end

struct Int; include Sqrt end

# Use as this:

require "big"

puts "27378181.sqrt = #{x = 27378181.sqrt}, #{x.class}"
puts "1038654027378181.sqrt = #{x = 1038654027378181_u64.sqrt}, #{x.class}"
puts "671038654027378181.sqrt = #{x = 671038654027378181_i64.sqrt}, #{x.class}"
puts "(10**17).sqrt = #{x = (10_i64 ** 17).sqrt}, #{x.class}"
puts "(11**31).sqrt = #{x = (11.to_big_i ** 31).sqrt}, #{x.class}"
puts "(12**111).sqrt = #{x = (12.to_big_i ** 111).sqrt}, #{x.class}"

Produces this output:

27378181.sqrt = 5232, Int32
1038654027378181.sqrt = 32228155, UInt64
671038654027378181.sqrt = 819169490, Int64
(10**17).sqrt = 316227766, Int64
(11**31).sqrt = 13854364834150661, BigInt
(12**111).sqrt = 784438960743806544996211780136486395318587740656004078924591, BigInt

However it’s implemented/named its inclusion will make Crystal computationally more useful/accurate.

1 Like

Ruby had a Math.sqrt(n) method which returned floating point values for all inputs n, so:
Math.sqrt(10) => 3.1622776601683795

It thus chose to include Integer.sqrt(n) so: Integer.sqrt(10) => 3

to eliminate any ambiguity for doing: 10.sqrt => 3|3.126.. (?)

However if up to me, I would create something like n.isqrt for integer values,
and let n.sqrt produce floats for all number types n. (You could even make isqrt
treat all its inputs as integers, like sqrt treats all its inputs as floats, sort of like ‘/’ vs ‘//’.)

It’s shorter, and looks more intuitive, natural, and iconic.

But these are implementation details. I can live with whatever is determined best.

What exact use case does this have, especially what benefit over Math.sqrt for Big* numbers?

Using this test code below we can see the loss of accuracy that
occurs for Math.sqrt(n) as n becomes larger.

The first line uses the previous given method for Integer sqrt,
compared to Math.sqrt(n) as a float, and converted to various integer types.

e = 19
puts "(10**#{e}).sqrt = #{x = (10.to_big_i ** e).sqrt}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_big_i}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e)}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_i}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_u32}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_i64}, #{x.class}"
puts "(10**#{e}).sqrt = #{x = Math.sqrt(10.to_big_i ** e).to_u64}, #{x.class}"

Here sqrt exceeds I32 range

(10**19).sqrt = 3162277660, BigInt
(10**19).sqrt = 3162277660, BigInt
(10**19).sqrt = 3162277660.168379332, BigFloat
(10**19).sqrt = -1132689636, Int32
(10**19).sqrt = 3162277660, UInt32
(10**19).sqrt = 3162277660, Int64
(10**19).sqrt = 3162277660, UInt64
(10**20).sqrt = 10000000000, BigInt
(10**20).sqrt = 10000000000, BigInt
(10**20).sqrt = 10000000000.0, BigFloat
(10**20).sqrt = 1410065408, Int32
(10**20).sqrt = 1410065408, UInt32
(10**20).sqrt = 10000000000, Int64
(10**20).sqrt = 10000000000, UInt64
(10**30).sqrt = 1000000000000000, BigInt
(10**30).sqrt = 1000000000000000, BigInt
(10**30).sqrt = 1000000000000000.0, BigFloat
(10**30).sqrt = -1530494976, Int32
(10**30).sqrt = 2764472320, UInt32
(10**30).sqrt = 1000000000000000, Int64
(10**30).sqrt = 1000000000000000, UInt64

Here the sqrt excedes 64-bits range

(10**40).sqrt = 100000000000000000000, BigInt
(10**40).sqrt = 100000000000000000000, BigInt
(10**40).sqrt = 100000000000000000000.0, BigFloat
(10**40).sqrt = 1661992960, Int32
(10**40).sqrt = 1661992960, UInt32
(10**40).sqrt = 7766279631452241920, Int64
(10**40).sqrt = 7766279631452241920, UInt64

This is the limit for accuracy for Math.sqrt(n) as a float

(10**76).sqrt = 100000000000000000000000000000000000000, BigInt
(10**76).sqrt = 100000000000000000000000000000000000000, BigInt
(10**76).sqrt = 100000000000000000000000000000000000000.0, BigFloat
(10**76).sqrt = 0, Int32
(10**76).sqrt = 0, UInt32
(10**76).sqrt = 687399551400673280, Int64
(10**76).sqrt = 687399551400673280, UInt64

Next exponment value of 77 creates errors in Math.sqrt(n) for float values

(10**77).sqrt = 316227766016837933199889354443271853371, BigInt
(10**77).sqrt = 316227766016837933199889354443271853371, BigInt
(10**77).sqrt = 316227766016837933200000000000000000000.0, BigFloat
(10**77).sqrt = 1467897147, Int32
(10**77).sqrt = 1467897147, UInt32
(10**77).sqrt = 4387618993402172731, Int64
(10**77).sqrt = 4387618993402172731, UInt64

Now conversion of Math.sqrt(n).to_big_i produces incorrect result too

(10**78).sqrt = 1000000000000000000000000000000000000000, BigInt
(10**78).sqrt = 999999999999999999993126004485993267200, BigInt
(10**78).sqrt = 999999999999999999993000000000000000000.0, BigFloat
(10**78).sqrt = 0, Int32
(10**78).sqrt = 0, UInt32
(10**78).sqrt = 0, Int64
(10**78).sqrt = 0, UInt64

Even worse now for Math.sqrt(n)

(10**99).sqrt = 31622776601683793319988935444327185337195551393252, BigInt
(10**99).sqrt = 31622776601683793319988935444319318001775823290368, BigInt
(10**99).sqrt = 31622776601683793320000000000000000000000000000000.0, BigFloat
(10**99).sqrt = 0, Int32
(10**99).sqrt = 0, UInt32
(10**99).sqrt = 0, Int64
(10**99).sqrt = 0, UInt64

Sometimes now Math.sqrt(n) will produce correct values for even epxponents

(10**100).sqrt = 100000000000000000000000000000000000000000000000000, BigInt
(10**100).sqrt = 99999999999999999999999999999986929427981463977984, BigInt
(10**100).sqrt = 100000000000000000000000000000000000000000000000000.0, BigFloat
(10**100).sqrt = 0, Int32
(10**100).sqrt = 0, UInt32
(10**100).sqrt = 0, Int64
(10**100).sqrt = 0, UInt64

But inaccurate results for odd large exponents.

(10**101).sqrt = 316227766016837933199889354443271853371955513932521, BigInt
(10**101).sqrt = 316227766016837933199889354443266966994053071110144, BigInt
(10**101).sqrt = 316227766016837933200000000000000000000000000000000.0, BigFloat
(10**101).sqrt = 0, Int32
(10**101).sqrt = 0, UInt32
(10**101).sqrt = 0, Int64
(10**101).sqrt = 0, UInt64

Thus, Math.sqrt(n) becomes inaccurate as a float, and all integer type conversions,
after a certain size, making it unusable for whole classes of numerical algorithms
applied to large numbers (cryptography, number theory, combinatorics, prime number
theory, factorization, eliptic curves, etc, etc.).

Crystal is excellent for doing numerical processing. This is a standard issue having to
do with the limits of how floating point values can represent numbers past a certain
point, which is why Ruby (et al languages) have at least an Integer#sqrt equivalent.

1 Like

I think for numerical purposes, and accuracy, this is really nice.

Could you send a PR?

Also, I’m sure there’s a faster way to compute n.to_s(2).size, like, not needing to create a string and then compute it’s size (maybe continuously dividing by two?).

1 Like

For n.to_s(2).size it is easier to count the number of 0 before first 1 is set and then extract it from the bit size of the containing type. IT will be much faster. It can be done as value & (1 << nbits) == 0 in the loop while decreasing the nbits from type size down to 0.

2 Likes

I was thinking that Crystal uses GMP already, and GMP has related functions. Maybe just bind some?

3 Likes

I used: n.to_s(2).size to find the bit length because it was
the most reliable that worked for all number sizes for n.

Compile the following code as:

crystal run bitslengthtest.cr -Ddisable_overflow --release

def bit_length(n : BigInt); Math.log2(n).ceil.to_i end

puts
puts "for: def bit_length(n : BigInt); Math.log2(n).ceil.to_i"
puts
exp = 308; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 309; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 310; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"
puts
exp = 3010; n= 10.to_big_i ** exp; puts "for n = (10**#{exp}), n.to_s(2).size = #{n.to_s(2).size}, bit_length(n) = #{bit_length(n)}, Math.log2(n) = #{Math.log2(n)}"

It produces this output:

for: def bit_length(n : BigInt); Math.log2(n).ceil.to_i end

for n = (10**308), n.to_s(2).size = 1024, bit_length(n) = 1024, Math.log2(n) = 1023.1538532253076

for n = (10**309), n.to_s(2).size = 1027, bit_length(n) = -2147483648, Math.log2(n) = Infinity

for n = (10**310), n.to_s(2).size = 1030, bit_length(n) = -2147483648, Math.log2(n) = Infinity

for n = (10**3010), n.to_s(2).size = 10000, bit_length(n) = -2147483648, Math.log2(n) = Infinity

Again, any operation using floats will beome inaccurate past a certain (large) size for n.

Ruby has method bit_length.

(10**308).bit_length  => 1024 
(10**309).bit_length  => 1027 
(10**310).bit_length  => 1030 
(10**3010).bit_length => 10000 

See its implemenation in https://github.com/ruby/ruby/blob/master/numeric.c starting at line 4894.

Since any representable integer value has a fixed binary representation its bit_length can be directly assessed from it.

Actually, the same issue arises for taking the nth root of any integer.
The sqrt of the Newton method is a specific case for any Integer nth roots.
A little bit more code can provide for users a huge amount of functionality not in other languages.

require "big"

module IntRoots

  def isqrt(n = self)   # Newton's method for integer sqrt
    raise "ivalid negative integer" if n < 0
    return n if n < 2
    b = n.to_s(2).size
    one = typeof(self).new(1)  # value 1 of type self
    x = one << ((b-1) >> 1) | n >> ((b >> 1) + 1)
    while (t = n // x) < x; x = (x + t) >> 1 end
    x
  end

  def irootn(n)        # Newton's method for integer nth root
    return nil if self < 0 && n.even?
    raise "root n not an Integer >= 2" unless n.is_a?(Int) && n > 1
    return self if self == 0 || (self == -1 && n.odd?)
    num = self.abs
    one = typeof(self).new(1)  # value 1 of type self
    e, x = n-1, one << (num.to_s(2).size-1) // n + 1
    while (t = (e * x + num // x ** e) // n) < x
      x = t 
    end
    x *= self < 0 ? -1 : 1
  end
end

struct Int; include IntRoots end

(10.to_big_i ** 517).iroot(13) => 5878016072274912830666108305036555474358

I don’t know if GMP library provides these methods.
But here, all we need is BigInt and no other dependencies.

I’ll create an issues topic in github to discuss all the details around this.

See the implementation of BigInt.

BigInt only has Math.sqrt, (lines 696-708), which returns a BigFloat, with all the problems shown.

Let me see if I can make my suggestion more clear.

Crystal implements BigInt using GMP, and GMP has functions to compute integer roots. Therefore, perhaps you could delegate the implementation to GMP, instead of rolling your own.

1 Like

I added bit_length to Int in this PR: https://github.com/crystal-lang/crystal/pull/8924

With that and using a GMP function we could implement this in an optimal way.

1 Like

I’m doubting this would be much useful for non-big numbers. When operating which large ints where this matters, you’d easily be approaching the domain limits of fixed-size integer types. If you want safe arithmetics for large numbers, you need to use Big* anyway.
So implementing this only for Big* using the already availble gmp functions sounds great. If we only provided it for Big*, we should perhaps call the method BigInt.sqrt.

2 Likes

The PR for bit_length creates incorrect results for large numbers as it uses Math.log2(n).to_i.

module BitLength
  def bit_length()
    x = self < 0 ? ~self : self

    if x.is_a?(Int::Primitive)
      self.class.new(sizeof(self) * 8 - x.leading_zeros_count)
    else
      self.class.new(Math.log2(x).to_i + 1)
    end
  end
end

struct Int; include BitLength end

# tests
puts "(10 ** 29).bit_length = #{(10.to_big_i ** 29).bit_length}"     # should be 97
puts "(10 ** 30).bit_length = #{(10.to_big_i ** 30).bit_length}"     # should be 100
puts "(10 ** 31).bit_length = #{(10.to_big_i ** 31).bit_length}"     # should be 103
puts "(10 ** 100).bit_length = #{(10.to_big_i ** 100).bit_length}"   # should be 333
puts "(10 ** 308).bit_length = #{(10.to_big_i ** 308).bit_length}"   # should be 1024
puts "(10 ** 309).bit_length = #{(10.to_big_i ** 309).bit_length.to_big_i}"   # should be 1027
puts "(10 ** 3010).bit_length = #{(10.to_big_i ** 3010).bit_length.to_big_i}" # should be 10000

However, the results are:

(10 ** 29).bit_length = 97
(10 ** 30).bit_length = 100
(10 ** 31).bit_length = 103
(10 ** 100).bit_length = 333
(10 ** 308).bit_length = 1024
(10 ** 309).bit_length = -2147483647
(10 ** 3010).bit_length = -2147483647

Again Math.log2(n) will always produce incorrect results for numbers past a certain size.
Does GMP provide an intrinsic method to perform bit_length?
I couldn’t find it, which is why I ended up using n.to_s(2).size as it seems to always work correctly.

Need to run spec against very large numbers, and check against results from n.to_s(2).size.

1 Like

For many (most) numerical algorithms you don’t know the dynamic range of your inputs for your data set. However, you always want your roots to be the same type as its value, to maintain data type continuity.

Your suggestion would mean the user would have to know the data type for each members of a data set, and then use the appropriate method. This is untenable, and unnecessary.

The suggested methods can work on all Integer types and return the same data type.

However, you do not need it, right?

The nth root of an integer is strictly smaller than the receiver except for a couple of edge cases. You stay in the type. So one option could be to make it polymorphic: isqrt could be computed in a simple way for primitive ints (if possible), and call GMP for big ints.

It’s been a long time since I finished my degree, but I recall some number theoretic algorithms naturally used integer square roots. It definitely has use cases depending on your domain.

Good point. I’ll change it to use to_s(2).size

Ruby use the GMP library too, so it may be instructive to dig into the dirty details of what/how they actually implement bit_length to handle all the cases dynamically.

I couldn’t find that function in GMP, and I believe Ruby doesn’t use GMP for all the operations.

It seems weird to have two methods named the same thing that produce different results. Maybe it could be sqrt and sqrt_big or something? Just a thought…