Mutable structs - what am I doing wrong?

I have been trying to use a mutable STRUCT in a situation that I think it should work in, but its behaviour is confusing me. I have reduced the problem to the following example:

struct Test
 getter val : Int32

 def initialize()
   @val = 0
 end

 def inc
puts " inc before - val = #{val}"
  @val += 1
puts " inc after - val = #{val}"
 end

 def reset
  @val = 0
 end
end

TESTS = { Test.new }

i = 0

puts "1 - TESTS[i] with i = 0:"
puts " orig: #{TESTS[i].val}"
TESTS[i].inc    # Increment to 1
puts " after inc: #{TESTS[i].val}"

puts
TESTS[0].reset  # Reset back to 0
puts "2 - TESTS[0]:"
puts " orig: #{TESTS[0].val}"
TESTS[0].inc    # Increment to 1
puts " after inc: #{TESTS[0].val}"

This produces the following output:

1 - TESTS[i] with i = 0:
 orig: 0
 inc before - val = 0
 inc after - val = 1
 after inc: 0

2 - TESTS[0]:
 orig: 0
 inc before - val = 0
 inc after - val = 1
 after inc: 1

In other words the behaviour is different if I use TESTS[0] and TESTS[i] with i = 0. I was expecting both to say after inc: 1. I assume that in the case of TEST[i] a copy is being made, but with TEST[0] it is not, but I have no idea what would cause that.
To make it even more confusing, when I change TESTS from a Tuple to an Array, the behaviour changes, and both print after inc: 0. Surely the behaviour should be the same whether it is a Tuple or Array?

Is this a bug, or am I just not understanding how Structs work?

1 Like

I guess you could consider it a bug. What apparently happens is that the compiler (not sure if Crystal itself or LLVM later) expands TESTS[1] at compile time to unpack the value, in other words rewrites it to a direct value access rather than to method call semantics. One could argue that in this case it should make sure to copy a value type first to preserve semantics.

1 Like

An access to a tuple element with a known index is inlined by the compiler so there’s no copy being made.

We have a similar situation with getters: if a method is just an instance variable access, that method is inlined and so if that ivar turns out to be a struct there will not be a copy.

So yeah, those are bugs.

However, I would simply try to avoid having mutable structs. If you have something mutable just make it a class.

2 Likes

Another corner case of mutable structs, that does not come up often, is that unions of mutable struct might misbehave in a similar way you are describing with tuples of mutable structs.

And for slices, since they are like safe pointers, it would be good for them to work. Currently slice[x].foo = val with struct types will not modify the underlying struct in the slice.

I those are the quirks with mutable structs. The compiler deals with literal tuple indices and with simple expression getters, but there are more cases.

1 Like

That’s basically how I’ve always thought about it, too: If it accumulates its own state outside of initialize, it’s better off as a class. If it never reassigns an ivar outside of initialize, it can be a struct — even if those ivars do accumulate their own internal state.

I like Crystal’s class vs struct a lot more than C++'s pass-by-reference vs pass-by-value semantics, where each method signature containing this type needs to know whether passing by value is safe. The decision of whether it can be passed by value is colocated with the information needed to make that decision, inside the class/struct itself. And I don’t need to update any method signatures to adapt to the change — it’s a +1/-1 diff. :100:

1 Like

Don’t you think that most people would expect the semantics of
s.method
where s is a simple variable holding a struct to be the same as
xxx_of_struct[idx].method
where xxx_of_struct is a class like Array, Tuple, Slice? I guess because [] is a method which in this case returns a struct, it actually returns a copy, but it is confusing IMHO, especially for people used to more conventional languages. Ideally it’d also be nice to be able to iterate over an array of structs without making copies of each element. Could there be a way for a method to pass back a struct by reference? Obviously such a struct would have to have been in existence when the method was called, but a class like Array would know that.

But then again, maybe it’s just opening up a can of worms… :smiley:

If the struct is not mutable this all makes no difference in practice, but it does seem that more copies of structs are made than you would think (or want), which must detract from their performance benefits. Which makes me wonder whether even
s = Strukt.new(...)
creates the struct and then does another copy to the variable.

1 Like

I think we should change the language to disallow mutable structs. But that would mean changing some std parts like Char::Reader and maybe a few others.

Crystsl allows some low level things like pointers and structs, but it’s mainly a high level language. When you do want to squeeze the last bit of performance then you can use those low level things but you have to be a bit more careful and conscious. That’s the tradeoff we chose. And we are not going to introduce more low level features like accessing struct references inside an Array.

That’s why I’d actually like the opposite: prevent the misuse of mutable structs by disallowing them in the first place.

2 Likes

It’s important to keep in mind that everything we’re doing involves method calls. There is no simple “accessing an index”, as in JavaScript or C.

And it’s not just an implementation detail, it’s fundamental to the design of the language. :slight_smile:

I’ve wondered this myself. :smiley: It sounds like a great idea, but when the method returns, the stack frame where that struct lived is gone so the memory location is no longer valid. Because everything in Crystal is method calls, doing anything with that struct instance is almost guaranteed to call another method, probably overwriting that memory location with a whole new stack frame.

There are absolutely times that classes are more performant than structs — I’ve encountered this in my own code. A struct’s holy grail is when it’s instantiated, passed to a couple methods, and thrown away.

If you call MyStruct.new infrequently but pass those few instances into deep call stacks, a class may very well be more performant because of all that copying, especially for structs that contain other structs. You should probably default to using classes and convert to structs iff that provides a significant, measurable performance improvement.

@jgaskins: It’s important to keep in mind that everything we’re doing involves method calls. There is no simple “accessing an index”, as in JavaScript or C.

And it’s not just an implementation detail, it’s fundamental to the design of the language. :slight_smile:

Sure. But if, for example, Array#[] in the example above could return a struct by reference, it should work the “expected” way.

@jgaskins: when the method returns, the stack frame where that struct lived is gone so the memory location is no longer valid

Yep, exactly why I said:

Obviously such a struct would have to have been in existence when the method was called

and there are a few obvious examples that spring to mind where this might be useful:

  • Returning self when self is a struct
  • Returning from Array# when the element is a struct

But to implement Arrays/Tuples of mutable structs as I was originally expecting, would require major changes, including allowing passing by reference (at least to yield) to allow things like:
array_of_struct.each { |s| ...<modify s>... }
to work, and as @asterite said it would probably be better to just not allow mutable structs. I would suggest, however, that the documentation on Structs needs to be improved. The Performance Guide currently says “Use structs when possible”, but doesn’t explain when they should and shouldn’t be used. That is really confusing for people new to Crystal (like me).

1 Like

I agree that code should do what we expect. That can be pretty subjective, though. If Array#[] worked the way you originally expected, it would be unexpected behavior for other people because of the typical interactions between structs and methods.

Gah, sorry, I totally missed that when reading through.

I agree, that’s pretty confusing. It might be worth opening a PR or an issue on the GitHub repo for the docs to discuss clarifying that section.

Mutable struct is not at all the same as mutable vector of structs, and I think it causes confusion. Immutable structs are absolutely fine, but a[x].value = 123 should work regardless, first thing that comes to mind is value=() returning a new immutable self.

As to the usefulness of structs it really comes down to the GC. If for any reason I need a deterministic memory model all I have is the stack, and as I can’t allocate a Reference on stack, I have to use structs. Well, short of mallocing and freeing pointers manually, at which point I’m fighting a tool, which is rarely a good idea.

We’ve recently had a short discussion on gitter along the same lines, and the result was an idea to extend Array#update to other containers which will not make a[x].mutate work, but will provide some way to use a mutable vector of structs in the form of a[x].update { |v| v.mutate }.

Looking at which structs are mutable in the std I found some bugs already. For example Iterator::Cycle is a struct but should be a class because it’s mutable.

Then there are cases when having a mutable struct helps improve performance a bit. For example Time::Format::Parser is a mutable struct but it’s only used in a single place like this:

  • create it
  • invoke a method on it (which accumulates data/mutates it)
  • get some result from it

It’s good that this is a struct because we save one extra allocation. If we force structs to be immutable we would need to use a class or some other, uglier, way.

One optimization is escape analysis: if the compiler can determine that the instance is created in that method and doesn’t escape that method, it could allocate it on the stack.

However, escape analysis isn’t easy to implement. So until then we could introduce something like unsafe_stack_new(...) that you could call on a class to allocate it on the stack. It’s unsafe because you have to make sure it won’t escape the method, but then we get a nice, intuitive language (structs are immutable, you can’t have problems with that) with an unsafe escape hatch to allocate mutable classes on the stack.

1 Like

but will provide some way to use a mutable vector of structs in the form of a[x].update { |v| v.mutate }

what will happen in update is that a copy of v will be yielded to the block and you would expect the block to return an updated version of v that would be then be inserted back into the Hash. You still won’t be able to directly mutate the structs that’s inside the Hash.

That’s fine, I’ll just return self from mutate. As for the proposed unsafe_stack_new, what’s gonna happen if I have an Array of those? The array itself can’t live on the stack, because its size is unknown, but will it contain pointers to instances on the stack (woah) or the instances themselves, provided they are fixed size?

I think that will hurt one’s ability to reason about what’s going on. I guess the problem is not easily solved, but probably the answer is somewhere in the expanded type system.

Structs are quite a nice construct, and appear in lots of other languages, so are pretty well understood. I am not sure that allowing a class to be allocated on the stack is really a nice alternative, though perhaps it is relatively easy to implement.

We currently have a limited form of mutable structs, which does not allow things like the following to work the way you might expect:

array_of_point.each {|point|
  point.x += x_offset
  point.y += y_offset
}

(where point is a struct) but this does work:

array_of_point.each_with_index {|point, i|
  point.x += x_offset
  point.y += y_offset
  array_of_point[i] = point
}

It’s not particularly nice, but it works. If structs become unmutable, this would have to be changed to:

array_of_point.each_with_index {|point, i|
  array_point[i] = Point.new(point.x + x_offset, point.y + y_offset)
}

That’s not so bad, but it gets worse with more complicated structs. Simple example would be a Triangle consisting of 3 corner points. To change anything about a Triangle requires rebuilding the entire struct, including all sub-structs.

Rather than unsafe_stack_new() and turning the above into classes, I think it would be better to somehow be able to cast an existing struct or a container of structs to allow the struct(s) to be mutable and to be passed by reference (at least with yield). Obviously the result would be unsafe. Array#update seems to be something along those lines. But I think you also need Struct#update, to deal with more complex structs containing structs. The result is in effect a pointer to the struct, but allowing “class-like” access. Example:

array_of_triangle.update.each {|triangle|
   corner1 = triangle.corner1.update
   corner1.x += x_offset
   corner1.y += y_offset
}
1 Like

For now I’m personally fine with this bit of a hack https://play.crystal-lang.org/#/r/7mhi

It’s inferior to not yet existing Hash#update as it has to do the lookup twice. It also doesn’t support levels of indirection so it can stay simple, but it allows to iterate an array just fine.

# looks like a.size.times is faster
a.each_index { |i| mutate a[i], x: 26, inner_y: 69 }