More on Symbols

In case anyone wants to try, all that is needed to implement dynamic Symbols is the ability to query the number of predefined Symbols:

# src/compiler/crystal/codegen/primitives.cr
class Crystal::CodeGenVisitor
  def codegen_primitive(call, node, target_def, call_args)
    @call_location = call.try &.name_location

    @last = case node.name
            # ...
            when "symbol_predefined_count"
              int(@symbols.size)
            else
              raise "BUG: unhandled primitive in codegen: #{node.name}"
            end

    @call_location = nil
  end
end
# src/primitives.cr
struct Symbol
  @[Primitive(:symbol_predefined_count)]
  def self.predefined_count : Int32
  end

  # renamed from `#to_s`
  @[Primitive(:symbol_to_s)]
  protected def to_s_primitive : String
  end
end
# src/symbol.cr
require "string_pool"

struct Symbol
  @@strings = StringPool.new(Math.pw2ceil(predefined_count))
  @@s_to_sym = Hash(String, Symbol).new(initial_capacity: predefined_count).compare_by_identity
  @@i_to_s = Array(String).new(predefined_count)

  private def self.add_symbol(str : String) : self
    i = @@i_to_s.size
    @@i_to_s << str
    @@s_to_sym[str] = i.unsafe_as(Symbol)
  end

  private def self.init_string_pool
    predefined_count.times do |i|
      add_symbol(@@strings.get(i.unsafe_as(Symbol).to_s_primitive))
    end
  end

  init_string_pool

  def self.new(str : String)
    str = @@strings.get(str)
    @@s_to_sym.fetch(str) { add_symbol(str) }
  end

  def to_s : String
    @@i_to_s[to_i]
  end

  def self.each(& : Symbol ->)
    @@i_to_s.size.times do |i|
      yield i.unsafe_as(Symbol)
    end
  end

  def self.all_symbols : Array(Symbol)
    Array.new(@@i_to_s.size, &.unsafe_as(Symbol))
  end

  def to_sym : self
    self
  end
end

class String
  def to_sym : Symbol
    Symbol.new(self)
  end
end
Symbol.all_symbols # => [:sequentially_consistent, :xchg, :skip, :none, :unchecked, :add, :active, :done, :to_s, :file]

a = "xchg".to_sym
b = "xchg".to_sym
a.to_i     # => 1
b.to_i     # => 1
:xchg.to_i # => 1

a = String.build(&.<< "foo").to_sym
b = String.build(&.<<("f").<< "foo").to_sym
a.to_i # => 10
b.to_i # => 10

Symbol.all_symbols # => [:sequentially_consistent, :xchg, :skip, :none, :unchecked, :add, :active, :done, :to_s, :file, :foo]

The counterargument is that if Symbols are used in this capacity, then one can simply use the Strings and the StringPool directly without pulling in all the predefined Symbol constants, thereby avoiding global state, and the memory usage / performance will be same. One key difference, however, is that the entire StringPool can be garbage-collected even if its elements cannot be removed individually. In fact I did this in a custom JSON (de)serializer to reduce the generated document’s size (with a hard limit on the number of distinct strings).

We tackled the operator token issue recently, and we are now pretty close to removing all Symbol-typed variables in the standard library and the compiler, other than API changes. These are #11775, #11020, and the following:

# src/time/location.cr

class Time::Location
  struct Zone
    # Prints `#offset` to *io* in the format `+HH:mm:ss`.
    # When *with_colon* is `false`, the format is `+HHmmss`.
    #
    # When *with_seconds* is `false`, seconds are omitted; when `:auto`, seconds
    # are omitted if `0`.
    def format(io : IO, with_colon = true, with_seconds = :auto)
      # ...
    end

    # Returns the `#offset` formatted as `+HH:mm:ss`.
    # When *with_colon* is `false`, the format is `+HHmmss`.
    #
    # When *with_seconds* is `false`, seconds are omitted; when `:auto`, seconds
    # are omitted if `0`.
    def format(with_colon = true, with_seconds = :auto)
      String.build do |io|
        format(io, with_colon: with_colon, with_seconds: with_seconds)
      end
    end
  end
end

with_seconds is unrestricted, but the method body expects it to be one of true, false, or :auto. If we deprecate these as well then symbols can only appear in compile-time contexts, e.g. as arguments to #responds_to?; I think these are fine, because those symbols emphasize their compile-time aspect.

4 Likes

I miss symbols a lot because I also used them as a different type to strings in ruby. Like: a method would react differently depending if it received a numeric, a symbol or a string. I used symbols especially in situations like user ids, match codes, etc, whenever it was important that those would be unique and one couldn’t stand for different objects. This can be done manually of course, but with symbols I got this completely for free and without any further action of my side. And sometimes really just as a second type of string. Imo they also make code far better readable, while also drawing the focus on a different type of thing: a symbol is primarily for the dev, while strings are usually for the user of an application (and need translations if multiple languages have to be supported).

Imo they bring a lot more benefits then just a performance boost for ruby. And while Crystal might not need them for the performance boost, I think that Crystal loses a lot by not having them (at runtime available) for those additional benefits.

If Crystal doesn’t want to support them the way as people are used to them in ruby I can fully accept it, but in this case I would suggest to get rid off them completely as a main feature, so people who really want them can have their own Symbol class (and being named symbol). It would be great, if it would be somehow possible to give those the ability to continue using the :foo syntax.

But tbh I don’t see a reason why they shouldn’t be offered just the way like people are used to them. Those who don’t know symbols, probably won’t use them anyway. And those who miss them would appreciate them a lot. I just don’t see how they would hurt anyone if they were fully supported.

1 Like

Could you give an example of such an API?

This is by far the most important thing they do for me.

YES! Thank you for extracting thoughts directly from my brain and putting them in words.

Personally I do not like symbols at all. People invented hash with indifferent access just to bypass it. I would use enums instead of symbols.

I don’t know if it’s used in any open source projects this way, of it those which those would be, but this here for example is something I use exactly like this or just with small modifications in many of my projects.

# here I use symbols as a second type of String

class X  
  class << self
    def [] x, *y
      x= case x
      when String 
        {name: x}
      when Symbol
        {match: x}
      when Integer
        {id: x}
      when Hash
        x
      else
        raise UnknownParameterType
      end
      select x
    end
  end
  #…
end

class Customer < X
#…
end

class Supplier < X
#… 
end

class Bank < X
# …
end

# Customer["John Doe"]      # their name
# Customer[:C024578]        # their customer number
# Customer[3852]            # internal number

# Bank[:SRLGGB2L]       # SWIFT code

It’s also, a string has to me a very different meaning than a symbol: a string is something I might translate or adjust depending on context. Like a name could be adjusted, probably not when addressing them in a letter, but when I talk about them for example. On a publication taking about which customers one has got, or when it’s about references in a CV, then one might translate them, or like “the White House” will likely get translated in any non-english news report. Or strings get adjusted grammatically for the individual use case ( singular/plural/etc) - but that’s all not happening with symbols, because they only serve the developer. They might end up occasionally in strings, but that’s not their main purpose; usually they stay in the code, end up in a database or maybe even some config file. A typo in a string (no matter if on one or all occasions) is a bug. A typo in a symbol is unfortunate (as it will lead to bugs) but is no bug as long as it’s written incorrectly the same way every single time.

The same way: even though they were introduced to ruby for performance reasons - I actually don’t know any project which uses them in that way specifically. Like if there is a situation where the same string gets used over and over (and where this string never serves any other purpose than eventually getting used as output to the user) I can’t remember even one occasion where this string would have been stored as a symbol in order to boost performance, like let’s say for example an email signature, it might get cached, but I haven’t seen a symbol made of it even once (better example would be probably the publisher’s name of a website, even though it will be used over and over on that page). Might give some boost, but I haven’t seen it once used this way for this purpose.

With syntax highlighting usually just symbols become relevant for me while I’m development as the program flow depends on it - strings on the other hand can be as such completely ignored, they might affect later the actual output, but not the program flow. Granted, there are a few, but very specific exceptions: ARGV, file names and URIs. But even with URIs I’m used to have the “first part” (whatever it is called) for example than available as symbol, as it’s of its meaning a symbol type and not a string (in this specific case something which could be an enum value, like in many but not all cases of symbols). During the development, most strings are just lorem ipsums, or at least I take no care in typos or such (except if I have to take care of the layout already). Strings, including URIs and file names, usually don’t get any of my attention until directly before the first preview is sent to a customer (to remove the worst placeholder texts, or to insert placeholders which help the customer to understand the intention of some text). Most of the texts (as long as it’s not just labels or error messages) will then anyway be sent by the customer.

Would we speak of some 3D game, then strings would be the responsibility of the guys with the wacom tablets and stylus and Adobe subscriptions. while those who deal with the symbols would write the actual code.

There are just situations in which an enum wouldn’t fit (or not even work, as it’s something which might appear only at runtime) and which are just semantically very different from a string. Just one example, where the same can be found in Crystal’s own syntax: fun foo="Foo_foo"(…) : …. foo and Foo_foo are somewhat similar in their meaning, but they follow different rules (like foo couldn’t use a capital letter) and each also serves a very different receiver (one has to make the Crystal compiler happy, the other a library), but so do usually strings, too.

TL;DR: Actually, and I guess I may speak there for all the other “symbol admirers” as well, it doesn’t need to be implemented in Crystal in a proper symbol way (as in: just get’s created once). I’m absolutely fine, if the handy :foo syntax stays (which also leads to syntax highlighting) and if I can do class, typeof and is_a? according to a Symbol on it. If you go and just create a second pseudo String class, and every time I do :foo or "foo".to_sym a new instance gets created, I’m fine with it (and two symbols should be equal if their string representation is equal; a string and it’s symbol should not considered equal). Crystal is so lightning fast anyway that I’m happy with the tiny performance toll I might have to pay (and everyone who doesn’t want to pay this toll can just avoid symbols in their code).

1 Like

Enums often don’t fit (but I admit, often or at least sometimes they would), but even worse: they can’t be created at runtime either.

No, can’t agree with this. People invented hash with indifferent access as a convenience method for the specific usecase of symbols as hash keys and I can see why.
In untidy code-bases where no style guide exists or is ignored, you might have hashes with either.
Or 3rd party code might introduce one or the other into your application.
In that case, indifferent access makes total sense.

But you can absolutely not draw the conclusion from this that “it was invented to bypass symbols” as a general concept.

Enums are close, but enums imply by name and declaration a group of related things (although you can of course declare only one ).
Like Enums, symbols are, well, symbols.

But Enums can be translated to something.
Symbols are just themselves. Like a function name ist not a string, it is a symbol (by philosophy, if not code).

I find symbols insanely usefull in ruby and I use them all the time not by default, but with intent.
If I use a symbol, i say “this thing is now identified by this :thing”. Be it a state in a state machine, be it a flag, whatever.
as an example: my functions don’t return { status: “success” } but {status: :success}.
Because they don’t return a string. they return the information that something succeeded.

I think this distinction is valuable, and adds even more valuable differentialization when looking through code.

1 Like

You can always use enums and strings to replace symbols. If you really have to use symbols, you can use a custom class instead.
I consider symbol to be a useless language primitive that causes more troubles than benefits so I would prefer removing it from Crystal Lang.
Python doesn’t have a symbol type and works fine.

I wish it could be removed from Ruby someday.

Python in general works fine, why don’t you just use that?
The answer is probably: because Crystal has features that i like.

So, using Python as an argument why Crystal should not have symbols is just null and void.

What troubles does the existence of symbols in crystal cause? Any examples?

3 Likes

So does brainfuck even without any characters - by that logic they could be removed as well. We don’t need classes, objects, arrays, and 95% of the stdlib, or things like the interpreter, formater or doc generator either - but we are glad to have them, because they make our life a lot easier.

Ifully accept if someone weighs the pro and contra of symbols differently, but that something else can do without them is really no point (as long as no one says it would be impossible to go without them, which didn’t happen).

1 Like

Customer[:C024578] # their customer number

That’s a bit strange in my opinion. Wouldn’t someone want to search a customer by their number? That means that the argument will be dynamic. How do you handle that in that case? Do you accept a string from users and turn it into a symbol?

I would prefer that to be:

Customr.find_by_customer_number(...)

It’s a bit more verbose, but nobody on the team has to learn the implicit rule that Symbol means “customer number”, or “swift code” depending on the class you are using. Less things to remember is always better in my opinion!

3 Likes

Symbols are similar to a globally available Enum with a lot of values. Defining a symbol just means adding a new global Enum instance, it’s just a syntax sugar.
Creating dynamic symbols is the same as creating global values but crystal doesn’t have GIL and it’s bad for multi threading.

You should always use a domain specific Enum instead of symbols when feasible.

Hash with indifferent access is the biggest trouble with symbols.
And the trouble of abusive use of symbols as a convenient Enum is another one.

In the past the talk has been to possibly not fully remove symbols, but instead relegate them to a more specific purpose of enum auto casting. As that’s really all I personally use them for.

While this may make sense for you, would it also make sense for someone reading your code? I’m kinda with @asterite on this one. Being a bit more verbose while making the purpose a bit more explicit can go a long way in having readable code. E.g. use a dedicated struct type to represent a unique identifier such as ID.new "C024578". Now you can document what an ID is and such.

Wouldn’t a Status enum be better suited for this? Then the compiler can actually help prevent typos, gives you a place to document the possible values for that field, and/or add additional helper methods if needed.

The one thing I would love to have is string backed enums. In most cases an enum is used to represent some particular state where the actual value of the enum doesn’t really matter. But there are also cases where a specific string value may also be expected. Being able to do something like:

enum Suit : String
  Hearts   = "H"
  Diamonds = "D"
  Clubs    = "C"
  Spades   = "S"
end

Would be a great addition for the cases where you want to strongly type a string field with a set of allowed values.

3 Likes

To provide an opinion from a different angle - I like symbols because they are visually different than enums or strings, and are almost exclusively used to represent something important. When visually scanning a crystal file, enums can be indistinguishable from class names or constants, and keys represented as strings are indistinguishable from other strings in the file (e.g. exception messages). Having symbols represented differently visually makes it easy to pick them out, and they almost always represent something from the domain being worked on (and so almost always has a domain definition behind the word I can look up for more context).

Enums would be preferable in most cases where you have an important key, but defining an enum requires you to know all possible values of the enum ahead of time. It’s not always appropriate or desirable to go through that exhaustive effort when 1) You might not need to use all values of the resulting enum anyways and 2) You don’t know which values ahead of time you’ll need either. Being able to devolve to symbols helps that from a developer perspective. Especially when you don’t have a need to limit the possible values either.

I would love that feature. I come from a Java background and love how enums are essentially classes there - you can give them constructors and a list of arguments. A single enum could represent a bunch of relevant properties,and so rather than relying an a case statement to convert the enum to the relevant property individually (or a hash, I suppose), you can just get the property from the enum directly. NamedTuple backed enums I guess would be the closest to that idea (or struct backed enums? Probably too much)

2 Likes

Values that don’t correspond to an enum’s constants are allowed.
https://crystal-lang.org/reference/1.5/syntax_and_semantics/enum.html#enums-from-integers

4 Likes

The “just” in the second part of that sentence basically dismisses the whole reason why I use and like Ruby and crystal.

I don’t need nor asked for dynamic creation of symbols.

Is there a reason for this? (btw, enums are also not dynamicly created.)

No, Hash with indifferent access is no problem at all.
See? I too can just write down statements.

If something is convinient in a programming language, that sounds like a good thing for me.

Symbols make editing and reading and understanding code better.
You see at a glance that this is not some random string.
You don’t have to type the whole module name like with enums, and :success is always == :success no matter what module you are currently in. or what context.
Symbols help distinguish possibly mutable string parameters or returns, from defined and immutable parameters and returns.
Enums can be a replacement for symbols, but they cover a different set of cases, and can be used in place of symbols in general, but only by losing some of the unique advantages of Symbols.
Crystal would be a poorer, less beautiful language without them.

2 Likes

@tsornson and @mavu and everyone else who misses symbols:
I wrote a shard to get symbols back. It’s a bit wonky, not as good as something out of the box, you probably shouldn’t run a bank or any life-supporting equipment with it, but imo it’s better than nothing.

Be warned: after 32768 self made symbols it might overwrite Crystal’s own symbols (if you had those already maxed). And should the amount of symbols ever exceed 65536 it won’t end well for sure. That’s very sad as this effectively means you better shouldn’t use them as universally as you were able to use them in ruby (like as keys for whatever - symbolize_names: true comes directly to my mind), but it already gives back a lot of comfort, and as long as you handle only your own stuff, I see no reason why you shouldn’t use symbols in your config files either (as they hopefully won’t have more than 30k keys):upside_down_face:
This limitation could be actually lifted, but it would have its price: your new symbols wouldn’t be compatible with crystal’s symbols any longer (that’s why I didn’t go this route yet, but I’m seriously considering it). While with the current approach they should be perfectly compatible as far as I have seen them used in crystal. If anyone of you thinks “screw it, let’s go for unlimited” I would be in (we might have to monkeypatch a few things, but I don’t think this would be more effort than what it is necessary currently in order to replace symbols of ruby code which shall be ported).

Edit: My bad. in all the excitement I got my numbers wrong. I guess, you don’t need to worry much about exceeding the amount, although there is a limit: Create 2147483648 symbols without worries, and be worried for the same amount additional - and after that it will end badly.

So currently a traditional symbol :a (which existed at compile-time) will compare equal to a symbol you might create dynamically at run-time by "a".to_sym.

Link to the shard will be added to this reply within the next few hours (I’m going to play a bit more with it first, but it seems to work absolutely fine so far). Using the new feature will only need a require "symbolx" and nothing else :grinning:

I have created now for an hour or so literally millions of symbols and it’s been pure joy with no surprises or issues, so it seems to be all fine.

And here it is: kjarex/symbolx.cr
Download it and require it accordingly, or if you wanna go the shards way, either do a rock add symbolx once it’s indexed or add this to your shard.yml manually:

dependencies:
  symbolx:
    github: kjarex/symbolx.cr

Then in your code, just add require "symbolx" and you’re all set!! Enjoy!

4 Likes

2 posts were split to a new topic: Restrict OptionParser argument to enum values