[RFC] Type Safe Stringification

Blacksmoke16 · November 4, 2022, 6:18pm

Problem Statement

Object stringification in Crystal is currently handled via all objects having a #to_s method. This works well in that you can always call obj.to_s to get a string representation of it. However, is has some cons as well.

It is not immediately clear that you need to implement #to_s(io : IO) : Nil and not #to_s : String to customize that behavior
There is no way to know if an object has a meaningful string representation at all, mainly for custom user classes/structs since obj.responds_to? :to_s is always true

For example, say you are doing something with sprintf, or anything that renders/prints data:

def render(content) : String
  sprintf "(%s) - %s", UUID.random, content
end

Where you want the content in this case to be anything that has a meaningful string representation. They key word here being “meaningful”. The only type safety you can get around this is by either dropping the type restriction and call #to_s on whatever is passed, or simply make it String. The latter of which would require the user to manually call #to_s on the object before passing it in, even if it overrides #to_s(io). Neither of these solutions prevent unintended stringification. E.g.

require "uuid"

class Foo; end

def render(content) : String
  sprintf "(%s) - %s", UUID.random, content
end

render Foo.new # => (9f4a57c3-9a5f-4a2a-bcc6-a613b2b51b9a) - #<Foo:0x7f16e241eea0>

In that the Foo object is rendered as #<Foo:0x7f16e241eea0> which is an obvious bug that could very easily go unnoticed unless you happen to have a test case for it.

Proposal

A possible way to improve upon these issues is to have a dedicated Stringable module that can be implemented within types to denote that they have a meaningful string representation. This would be easy enough for the majority of stdlib types, but most useful for custom user classes/structs. Ultimately this can allow for more type safety and reduce the amount of bugs due to unexpected stringification.

Continuing with our previous example, you could update the type restriction of #render to be def render(content : Stringable) : String. This would now raise a compile time error if you tried to pass something that doesn’t have a meaningful string representation. In regards to sprintf itself, you could in theory update its signature to be def sprintf(format_string, *args : Stringable) : String to obtain a similar guarantee. Where sprintf "%s - %s - %s", 123, "bob", UUID.random would be fine, but sprintf "%s - %s - %s", 123, "bob", Foo.new wouldn’t be unless it implements the module.

The other benefit would be better enforcing users implement the correct override #to_s(io) : Nil versus #to_s : String.

Considerations

The main implementation issue around this is that because all types inherently implement to_s(io : IO) : Nil, you can’t just have the module be abstract def to_s(io : IO) : Nil. Nor can we just drop the default implementation as that would (probably?) break existing code.

Because of this you’d have to do something like:

Have the module implement to_s(io : IO) : Nil itself, but require you to define like to_str(io : IO) : Nil
Some macro logic to assert the method is overridden in the including type, or a child of it
Some macro logic to assert the method’s definition isn’t an ancestor/default implementation
???

Future

Longer term, if so desired, the module could also be made required if we ever wanted to remove the global default to_s method.

Summary

Add a new module interface that denotes a type has a meaningful string representation
No automatic/implicit casts of Stringable types to String parameters
Backwards compatible, added to common stdlib types
Prevents incorrect #to_s definition

asterite · November 4, 2022, 7:08pm

Here’s my opinion: Why I love Ruby: string representation - DEV Community 👩‍💻👨‍💻

Note: it’s just my opinion :-)

Blacksmoke16 · November 4, 2022, 7:22pm

I think this wouldn’t really change that. Currently Crystal has 3 different ways to print a string representation of an object:

to_s
inspect
pretty_print

One could argue the latter two are 99% of the time what you’d want for debugging, and should be implemented by default. But not every object is going to have a meaningful string representation. Which is where Stringable could fit in longer term.

Related: [RFC] Specify `#to_s` vs `#inspect` · Issue #9966 · crystal-lang/crystal · GitHub

HertzDevil · November 4, 2022, 7:22pm

What isn’t a meaningful string representation?

Blacksmoke16 · November 4, 2022, 7:24pm

Given a class like:

class User
  getter name : String
  getter email : String

  def initialize(@name, @email); end
end

A meaningful representation, IMO, would be like John Smith <john.smith@gmail.com>, whereas #<User:0x7f16e241eea0> isn’t very meaningful to the end user as its more a debugging/internal representation of it. I.e. is not very useful if used within an error message or something along those lines.

asterite · November 4, 2022, 7:38pm

I guess this would also be used for string interpolation, right? That User output isn’t useful in interpolations either.

asterite · November 4, 2022, 7:41pm

I’m thinking that maybe %s should only work with strings. That would maybe fix this problem? to_s is just one way to turn something into a string. For business logic you might need different representations.

HertzDevil · November 4, 2022, 7:42pm

Developers are also end users. If this change were done I would simply make all types include Stringable.

Also one purpose of #to_s is to define a canonical string conversion in the absence of extra arguments; it is by design that sprintf("%s", x) must work for any x. John Smith <john.smith@gmail.com> is tied to a particular formatting within specific business logic and is not a canonical representation. So you should define your own modules and not rely on splitting Object#to_s into a “meaningful” group and a “meaningless” group.

I fail to see why (9f4a57c3-9a5f-4a2a-bcc6-a613b2b51b9a) - #<Foo:0x7f16e241eea0> is an “unintended stringification” that deserves to be prevented by the standard library’s sprintf.

Blacksmoke16 · November 4, 2022, 8:01pm

I should clarify that sprintf was just an example. Nothing needs to change about how it works at this moment and this module could be added in a 100% backwards compatible way.

Let’s switch to a different example that made me think of this. Say you have a library to render a table to the terminal. Each cell consists of a string. The API for defining the rows/cells ideally would allow passing values to use. E.g.

table.add_row 123, "hello"

To provide some type safety, you could make this definition add_row(*cells : String), but would require you to do 123.to_s, "hello", which is fine, if a bit verbose.

The Stringable type would available to type it as add_row(*cells : Stringable) to prevent user types that do not have a specific representation in the user’s business logic, preventing you from trying to use a stringable object as the cell’s content unintentionally.

The module could be added to all the scalar/collection types in the stdlib to make them work out of the box. You could also retain the same behavior as today by just not restricting it at all and call .to_s on everything as you can now.

EDIT: I guess in short what Stringable really represents, is a standardized way to know that a custom string representation for an object exists. Especially for user defined structs/classes as there’s not a way to tell that it was customized, or if its using the default implementation.

HertzDevil · November 4, 2022, 8:12pm

Then the appropriate action is to rely on a custom method like #to_table_row and forward definitions for common types to #to_s. Something usable for a table row is not necessarily usable in every other set of business logic rules, defining a single Stringable module will not make this problem go away.

straight-shoota · November 5, 2022, 10:38am

I agree with @asterite that it’s a good thing that all values have a default stringification. And it should not be considered an error to use that. It’s a feature. Even if it sometimes does not add a lot of value.
But then it’s up to the owner of the respective type to provide a meaningful implementation. If they think the default implementation is meaningful, then that’s it. Even if you forced them to implement a Stringable interface, that by itself wouldn’t guarantee anything more with regards to being meaningful. It only enforces to have stringification implemented somehow. It might be just as meaningfull/-less as #<Foo:0x7f16e241eea0>

There are also practical problems with this approach when it comes to type hierarchies. When a parent type implements the stringificiation method, the interface is satisfied. Even if subtypes add a lot of other stuff that isn’t covered by that making the resulting stringification very much meaningless.

So even if it was desired, I don’t think Stringable would be an effective tool for a significant improvement.

straight-shoota · November 5, 2022, 10:53am

Also, whatever is considered meaningful depends very much on context. Your table format example has very specific needs that can be entirely different in another application.
As an example: Time supports many different formatting options which can produce a ton of different meaningful stringifications for different use cases.
#to_s is supposed to be a good default, but it doesn’t fit everywhere. If a specific application has specific needs for a generalized stringification it should either take care of that itself (works only for known types) or define an interface for other types to implement (such as #to_table_row suggested two comments above).
I think this is a much superior solution to relying on one single way for all stringification needs. The power is obvious when the stringification interface is made context-aware. That way you can pass in localization options, user settings, or UI context (such available space in the table column).

Blacksmoke16 · November 5, 2022, 4:26pm

I guess my assumption was in most cases the stringable object would be domain specific. So like there would be a type dedicated to storing the structure cell content, versus using a type that’s also used for other things. In which there could be some merit in reusing the same to_s method since the type itself is dedicated to the use case, versus tuning the method to the use case.

But yea, ultimately due to the base type already having to_s(io) define and it not guaranteeing the string representation is meaningful, Stringable doesn’t really provide any benefit.

straight-shoota · November 5, 2022, 8:36pm

If it’s domain specific, you can do whatever you like. A generic type in stdlib is neither necessary for this, nor would it have any real benefit.

For example:

module MyShard::Stringable
  abstract def stringify(io : IO)

  def to_s(io : IO)
    stringify(io)
  end
end

rogerdpack · November 9, 2022, 5:27am

Maybe could only define #inspect if it’s a human friendly output?

Topic		Replies	Views
To_s in classes as default for its objects Help & Support	12	1168	April 30, 2020
Need Object taint Community	6	423	December 9, 2020
[Mini Review] Giving up on Crystal	14	11068	March 24, 2019
Object#presence Crystal Contrib	27	1186	July 29, 2020
Checking a nillable type prior to passing it to a function which expects it to not be nil Crystal Contrib	11	990	January 3, 2022