Confused about method chaining on struct


#1

I found it to be confusing how method chaining works on mutable struct.

If you have code like this:

struct MyStruct
  def initialize(@a = 0)
  end

  def plus_1
    @a += 1
    self
  end

  def to_s(io)
    io << @a
  end
end

a = MyStruct.new
printf "result of a.plus_1        = %s,  after that a = %s\n", a.plus_1, a
printf "result of a.plus_1.plus_1 = %s,  after that a = %s\n", a.plus_1.plus_1, a

It gives you the result:

result of a.plus_1        = 1,  after that a = 1
result of a.plus_1.plus_1 = 3,  after that a = 2

That looks like the first method call acts on the stuct itself, but the second call acts on the copy of the result of the first call. Therefore the struct value gets modified by the first call, but the second call modifies only the copy that is discarded after printing.

That looks strange. I would think that calling the method on the object is not the same as passing the object as a parameter to any function. When passing struct as a parameter, it is passed by value (copied), and that is fine. But when stuct is the receiver of the method, you don’t expect the receiver to be copied. And for the first call it is not. That’s fine. But for the second chained call this does not hold. And that is strange.

I would find it ok if the receiver be always copied. Then struct would be really immutable unless you explicitly assign it to itself. And then you would know that method calls just cannot modify the reciever “in place” as struct is Value and immutable.

Of course, this code works fine if you change struct to class. Then everything works without surprises as class is Reference and is passed by address.

Any comments on that, please?


#2

For the second call, the receiver is a.plus_1, not a.


#3

Yes. But plus_1 returns self, so I would expect it to be the same object.


#4

Returning any struct results in a copy of that struct, even if that is self.

That is not to say that this is the desired behavior, but it’s how things work and it’s a bit hard to change.


#5

And in general mutable structs are a smell (in Crystal). That’s why it’s recommended that struct should be immutable (but this is not enforced and you have to be a bit careful).


#6

Thanks, @asterite. Clear expalanation. I think it would be good to cover that aspect in more details in the language reference page on struct as method chaining is so popular style to write code in many languages now. Would it be fine if I create a PR for this page trying to make it better?


#7

I agree that the documentation needs to be changed. It seems to say that the only practical difference between classes & structs is that structs are passed by value, which I understood to mean when being passed to a method. In fact it much more than that. Merely assigning:
b = a
where a is a struct, copies the struct, so that b and a are completely independent from each other. And it appears to be the same when a struct is returned from a method. In other words they behave to some extent like a number or a char. This is only spelt out clearly if you look at the documentation of Value.

The other thing that confuses me is the statement that structs are allocated on the stack. I do not know anything about the internals of Crystal, but I tried the following:

class MyStruct
  def initialize(@a = 0)
  end

  def to_s(io)
    io << @a
  end
end

def test(a, n)
  a << MyStruct.new(n)
end

a = Array(MyStruct).new
10.times do |i|
  test(a, i)
  puts a[i]
end
a.each do |s|
  puts s
end

I expected the structs created in test to overwrite each other as they would be created in the stack frame of test, so the final loop would write out all 9s. But for some reason this doesn’t happen. It works just as if MyStruct was a class. Maybe it is a different stack? Or is my test too simplistic?


#8

Hi, @mselig. When you create your struct with the ‘new’ in test method, the struct is created and pushed into the stack. When you push it to array, the copy of that struct is put into the memory address pointed by a[i] and the original struct is popped from the stack. The same would be happening with any primitive type like int. That’s perfectly normal.


#9

Hi @avitkauskas, I’m sure you are right. The problem for me is that the documentation says:

If you declare your type as a struct instead of a class , creating an instance of it will use stack memory, which is much cheaper than heap memory and doesn’t put pressure on the GC.

What this test shows is that, when used in another class, it is again copied rather than pointed to. But it must be copied to the heap (possibly as part of the allocation for the class instance using the struct), right? I understood this documentation to mean that all instances of a struct are on the stack (including copies), which cannot be. So again the documentation is not really accurate. At least in this case what happens is reasonably intuitive!


#10

@mselig Yes, the struct instances in your example are eventually stored in heap memory. But they don’t allocate any heap memory themselves. They’re simply written to memory that has already been allocated to the array’s buffer.

The documentation states that when you create an instance, it will be on the stack. Obviously, every value stored on the stack can also be put to the heap in some way, for example as an instance variable of a data structure, or an array item. But value types don’t allocate their own memory on the heap, neither implicitly nor explicitly.


#11

@straight-shoota, I guess my issue is that I found the documentation confusing and inaccurate. Saying “creating an instance of it will use stack memory” may actually be true, but it may also be that:
array_of_struct << MyStruct.new
is optimised so that the struct is actually created directly in the array space, avoiding a copy from the stack. IMHO it would be far better to say that structs are treated as values rather than references (as the ancestor class implies!), and that this means that they may be copied as they are used, which may cause unexpected results. You wouldn’t describe the behaviour of Numbers or Chars as being allocated on the stack - they are simply stored where they are used, and this also appears to be the behaviour of structs, right? The extra complexity in understanding structs is because they are mutable.

Anyhow back to the original chaining issue of this thread…
Would it be at all possible to treat return self in a Struct method as a special case and not return a copy? Unlike returning any old struct, we know that self must exist in the calling method.


#12

@mselig, everything is fine with Crystal concerning struct, value object and stack. Crystal, as programming language, does not invent anything new here (neither it should), it uses all the conventional normal principles and terminology. If it confuses you (and me) sometimes, then it’s only because of our lack of knowledge, not because of the design deficiency of the language.

Concerning the method chaining, I also checked with C# which also has the same idea of struct on stack and class on the heap, and the behaviour in C# is absolutelly the same. And this is good. If you really think about the Value object and passing by value, the behaviour or copying even when returning self actually makes sense.

BTW, I created a pull request for the update of the struct documentation page. If it be accepted, there will be more details provided to give less confusion.


#13

This expression first creates a new instance of MyStruct and then uses that value as an argument to #<< on array_of_struct. There is no room for different interpretations.

Whether there might be any optimization applied to the binary execution code doesn’t matter. Optimization doesn’t change semantics.


#14

Hi @straight-shoota, I think you may have misunderstood what I was saying. There is no change to any interpretation or semantics in what I suggested. I was simply pointing out that there is a possible implementation of that Crystal statement that does not involve allocating the struct on the stack.