Why does Hash.size return Int32 and not UInt32?

Frederik-Baetens · March 4, 2019, 1:14am

It would seem better if hash returned an UInt, this is more specific.

konovod · March 4, 2019, 9:45am

UInt is а pretty specific type. It shouldn’t be used to limit possible numbers range, only when it is really needed.
https://carc.in/#/r/6eut

class MyHash
  def size
    0_u32
  end
end

h = MyHash.new
if h.size - 1 > 1000
  puts "hash too big"
end

straight-shoota · March 4, 2019, 12:28pm

Int32 is the default integer type in Crystal. All stdlib methods returning an integer should return Int32 unless there are very specific reasons not to. The reason for this is that math operations including different types are prone to error (because a unsigned type could easily lead to an overflow). That’s why for sizes and other dimensions, you should always use signed integers, even if the effective value range is limited to positive numbers.

bew · March 4, 2019, 1:18pm

Would the checked overflow allow to use more specific types like UInt32 here? Since there would be no risk for error due to underflow

konovod · March 4, 2019, 1:39pm

In that case my example (size - 1 > 1000) will raise at runtime when size is 0. I don’t think it’s a good behavior.
Having a size UInt32 doesn’t add safety. The benefit from it is that you can have arrays with more then 2billions of elements, but you pay for it with reduced convenience of math operations. It would be much better to have an Int64 size, but this would severely degrade performance on 32-bit architectures.

stronny · March 4, 2019, 1:58pm

As I see the problem, it doesn’t lie within the size type, which indeed seems natural to fit into an unsigned integer, but with a subtraction operator. So on the one hand, if we declare UInt - Number => UInt, we risk underflow, on the other hand, if UInt - Number => Int, we risk overflow. Which is nothing new, any language has to do something about this.

The size keyword is irrelevant here in my opinion, I read this thread as don't use UInt at all which seems a bit too strong a statement to me.

konovod · March 4, 2019, 2:21pm

If you make size Int32 you don’t have problems with overflow\underflow (unless you use really big numbers). So what’s a benefit of making it unsigned?
I don’t think it can help to catch any errors.
I think that UInt32 is needed for binary protocols, hash functions and some other places where its overflow behaviour is exactly what we expect and need. But not for general computations. Yes, it is a controversial topic - i’ve seen two holywars about signed vs unsigned size of containers in different languages. What is more important - one more bit of possible size vs easier to understand behavior?
One more funny example.

#imagine we can't use `reverse` because e.g. array can change in a loop.
i = arr.size
while i >= 0
  puts arr[i]
  i-=2
end

won’t work if size is unsigned. You have to use

i = arr.size
while i < arr.size #yes, that's not mistake. loop while i is less than size because it will overflow once you reach zero.
  puts arr[i]
  i-=2
end

straight-shoota · March 4, 2019, 2:30pm

No, it’s don't use UInt for math. Most use cases for number types revolve around some kind of mathematical computations. To make this work, you need signed integers because that’s the usual domain for math calculations.

stronny · March 4, 2019, 2:32pm

This whole thread is actually a surprise for me. I didn’t think of it that much, but was under the impression that boundary checks are built-in and 0_u - 1 will just raise.

don’t use UInt for math

Why does Crystal have UInt#-() then? Surely it would convey this message much stronger if you would have to cast UInt to an appropriate type by hand?

konovod · March 4, 2019, 2:37pm

Raising won’t help here. You will get exception instead of silent bug - that is better, but still isn’t what you need when iterating an array. You have to pay extra effort to deal with unsigned integers (unsigned sizes in this case).

stronny · March 4, 2019, 2:41pm

What is a negative size? Does it have a physical sense?

Maybe we should introduce a new type family: Size32 et al. that will not have math operators at all.

konovod · March 4, 2019, 2:48pm

I don’t get “physical sense” argument. Yes, we pay 1 bit (half of possible range) to have less problems with overflow. If 1 bit is important to us - we can make own container. If it is important in most cases - we can discuss making a size unsigned by default.
But having a physical sense won’t by itself solve any problems or help in any other way.

stronny · March 4, 2019, 3:02pm

No, it’s the other way around with me, I don’t want to think bits, I don’t really care about the integer width, I optimize semantics, thinking. There is no law that says that Array#size should return an Int at all, and in fact in JS it doesn’t. Once you have a type system its only benefit is creating a mental model that helps you reason about the problem, the solution and the code, and having to worry about over- and underflows should be left for those who optimize, not be the default position for an author to be in.

Size in general can’t be negative, so I want my type system to reflect that. In absense of a specialized type I pick something that has this property – an UInt, but now there is a risk to accidentally use it in a wrong context, because it is a number after all, so I propose we fix that.

Frederik-Baetens · March 4, 2019, 3:14pm

But why does crystal return an int, but other languages like rust return an Uint on vec.len? I feel like it does guarantee that the length is positive.

stronny · March 4, 2019, 3:18pm

There is no universal good answer here, you have to pick a compromise (or expose options to the user). Rust chose one set of positive outcomes, Crystal chose another.

konovod · March 4, 2019, 3:21pm

Well, that makes sense.
But on a low level the reality is that if we want to write correct code, we have to either

limit a size to 2147483647. Maybe a type that have half an Int32 range (https://github.com/crystal-lang/crystal/issues/2747) could be useful, but it will most likely have a performance impact that is undesirable for such basic types like a Hash.
correctly work with UInt32, taking care about overflows (as in my example).
convert it to Int64 for calculations (will degrade performance on 32-bit architectures, possibly even on 64-bit ones).

konovod · March 4, 2019, 3:22pm

Rust and DLang have unsigned size, Go and Kotlin (and perhaps all JVM-based languages) - signed.

stronny · March 4, 2019, 3:31pm

Of course I understand the implications. Also n / size will raise on zero size, which is fine. My position is this: just like there is no number that could be correctly represented as a division by zero result, there should be no number that represents 0_u - 1 result, even if mechanically it’s simple to just wrap around the zero.

This is not a low level discussion on my part, keeping in mind that all this has already been discussed a million times, this is an argument on safety, correctness and static type system.

konovod · March 4, 2019, 3:45pm

Let’s imagine there is no such number, compiler checks everything for us. How this code

i = arr.size
while i >= 0
  puts arr[i]
  i-=1
end

should behave? Should it iterate array correctly or be rewritten?

stronny · March 4, 2019, 4:00pm

There are several distinct topics here, let me try and unpack them.

Compile time

If we make a specialized Size type, then the line i-=1 should become a compile error, because there should be no Size#-() method.
Then the user should realize that there is either a need for an explicit cast, or a misconception in the code in general, in this example that would be manual array iteration in the first place, instead of arr.reverse_each { |n| puts n }
This is a far-fetched idea, but the compiler could help the user even with no modifications to the type system, for example issueing a warning on the same i-=1 line, suggesting it to be modified to i-=1 unless i.zero? or somesuch.

Runtime

In any case there should be a runtime boundary check, which will have to safeguard against overflows and underflows, keeping the values true to their types. In this example, provided that arr.size is unsigned, it should raise something like an UnderflowError upon encountering this condition.

Topic		Replies	Views
Is this a bug? Help & Support	9	274	February 14, 2024
Array type parsing error/improvment?	8	438	July 4, 2020
`Slice(T)#size` should be `UInt64`, `UInt32`, or `LibC::PtrdiffT` Crystal Contrib	1	399	April 18, 2020
Is increasing array index size in the future? Crystal Contrib	33	653	March 1, 2025
Has crystal ever consider make default int bigint? Crystal Contrib	3	216	January 11, 2024

Why does Hash.size return Int32 and not UInt32?

Compile time

Runtime

Related topics