More on Symbols


#1

I am fairly new to Crystal, being a Ruby programmer for many years. One of the main reasons that I looked at Crystal is that it is so Ruby-like, and that means I can port existing quite a bit of Ruby code relatively easily. I have a concern when there is serious talk about making Crystal less Ruby-like. And that includes talk about removing Symbols.

Ruby programmers are encouraged to use Symbols as hash keys rather than Strings. This is not only for performance, but also for program readability. These port over to Crystal in some cases only - they fail when the keys come from outside the program and need to be converted from Strings. As pointed out by others, Crystal also has Enums which can be used in a very similar way to how Symbols currently work, except that you have to define them (which can be a good thing). Partly because of this, some people on this forum have suggested that Symbols be removed completely from Crystal. I think that would be a bad idea.

What I would much rather see is that Symbols be implemented more like they are in Ruby. That includes implementing String#to_sym. Crystal has StringPools which seem to work in much the same way, but are far uglier to use. Although the performance of Symbols would then probably not be much different than using Strings, there would be better Ruby compatibility. People who want ultimate performance can use Enums. Perhaps StringPools could then be removed instead?


#2

I think this is fundamentally a semantic issue. Ruby goes out of the way to make a distinction between strings as opaque data and symbols as specifically names. Symbols are native class for methods and ivs identifiers, as well as a central element that makes key: value notation work so well.

Well, Crystal doesn’t really have run-time reflection, so the first point is moot, and key: value is reused as NamedTuple, so there you go.

I still think some kind of “strings that are names” with a distinct syntax would probably be a good idea, and Symbols are already there, but after a while I’ve become somewhat less hot on this topic, since I’ve stopped using free-form hashes as a primary data structure and moved to JSON.mapping classes. I mean, the compiler gives you names at compile time, why not use them?


#3

Hi @stronny, the problem for me is that some new “strings that are names” mechanism would just create another construct similar to Symbols and Enums. And since the compiler already gives you names at compile time in the form of Enums, it makes sense to me to change Symbols to be more Ruby-like without adding new syntax. I accept that the {key: value, ...} notation is already used by NamedTuples - my main issue is that there is lots of Ruby code that gets fields from the outside world and immediately maps them to Symbols (including code I wrote!), and I see no good reason why this shouldn’t be possible in Crystal.

For fun, I am actually trying to have a look at the compiler in my spare time to see what scope there is for this. My thinking is that perhaps Symbols could be compiled almost exactly the way they are now, but add a mapping from their name as a string to the symbol value (assuming that it doesn’t exist at the moment). Then if a program uses String#to_sym we could at that point create a hash table of all the symbols, allowing new Symbols to be created dynamically and subsequent String#to_sym calls to work efficiently. That means that if String to Symbol mapping is never used, there is no performance penalty. Perhaps there is a hole in this - I’m not sure yet!


#4

Symbols are definitely a really nice feature. But they work better in a dynamic language like Ruby. In Crystal there is essentially no real benefit over using either enums (for a fixed set of values, and also gives type safety) or strings (for dynamic values).

In Crystal, symbols are implemented completely static. They’re essentially converted to integers by the compiler and thus need to be fully known at compile time. You can’t add symbols dynamically by converting a string as in Ruby. Without that, you could only map strings to known symbols. That’s certainly not a desirable solution because it can’t go the full way.

In order to be able to do that, the symbol implementation needs to be completely changed and make symbols essentially equivalent to strings. That’s going to impact performance and reduce the advantage of symbols. Using a hash table won’t be able to mitigate this. All in all, this would make symbols more complex and less useful.

The major use case for symbols right now is for named tuple keys. If we trade away efficiency for flexibility, this use case vanishes, giving even more reason to remove symbols entirely from the language.

Please name a good reason why this should be possible instead of turning the argument around?

In the way this would be implemented, there is no essential difference to simply use strings directly. It would just be a very similar alternative without any major benefits. And it makes it harder to use because people need to decide between both which can be very cumbersome.

In Ruby this is already known to be a problem which resulted in “solutions” like HashWithIndifferentAccess.
Without symbols, there wouldn’t be such problems in the first place.

Matz even tried to remove them from Ruby, but that would break to much code. In Crystal we can do so.

I know it seems hard at first. But when you think about it at bit longer, you’ll notice that you can do very well without symbols. I can’t remember having used a single symbol in Crystal for anything else than named tuples and autocasted enum values.


#5

To be fair, this

And it makes it harder to use because people need to decide between both which can be very cumbersome.

is not a problem if you add autocasting from symbols to strings. There is also no static string in Crystal, which again will lead to difficulties trying to optimize for allocations.

Or you may go the other way and just change the syntax so that :text will also be a String. Most people don’t care about optimizations imo, they just like the syntax.


#6

@straight-shoota wrote:

Please name a good reason why this should be possible instead of turning the argument around?

I think you only have to read my first post in this topic to answer that. There are way more Ruby programmers than Crystal programmers, and way more Ruby code than Crystal code in existence, and making it harder to port just makes Ruby programmers less likely to use Crystal. I think I am a reasonable example of this. I started off playing with Crystal by trying to compile existing Ruby code. I hit several roadbocks on the way, some of which I accept as being reasonable, and some of which I do not. Being persistent and because I really like the concept of Crystal, I thought it would better to voice my concerns & suggestions here rather than going away and dropping Crystal quietly.

<soapbox>
This discussion comes down to why you believe Crystal should exist. If it is merely a language “inspired by Ruby” which will go off in its own direction, I think it will most likely wither and die like most of the multitude of languages out there today. However, if it implements as much of Ruby that is feasible, then its chances of success are much higher, as it will keep on attracting Rubyists.

I accept that certain features in Ruby are not feasible in Crystal. I also accept that Ruby has evolved and has left in a certain amount of junk which could and perhaps should be removed, and has been done in Crystal. I even accept that Crystal has corrected Ruby’s English grammar (include? -> includes?) :smiley:. However, I think one of Crystal’s goals should be to maintain good Ruby compatibility, certainly in commonly used features (such as Symbols and dare I say it, also Strings, which I would argue should be made mutable, but that’s another topic), whether or not that means it is a “perfect” language.
</soapbox>


#7

I understand that Crystal compiles Symbols to a number. I think you have misunderstood my suggestion, which actually would not impact performance of Symbols and does not make them into strings. They are left as-is, with the compiler adding a run-time accessible list of symbol names & their values if it is not already available. That is all. Other changes would be at run-time to add support for to_sym. Only if to_sym is actually used would a hash table of Symbols be built. When a new Symbol is created, a new number would be allocated for it, different from all the other numbers that have been assigned to existing Symbols, including those done by the compiler.


#8

I agree with @mselig, in principle it seems possible to add the extra flexibility without sacrificing the existing runtime performance. Just a small compilation overhead would perhaps be needed.

The basic idea is that the compiler would build a String->Int32 mapping, with keys being the symbol names it has seen in the code and the values would be the Int32s they map to.

I suppose that would be a rather small overhead, if any, because the compiler already needs to somehow keep track of the symbol names it has seen to ensure that the same name always maps to the same number.

The only method that would ever make use this mapping would be the String#intern (==String#to_sym). It would do the lookup into the aforementioned mapping and add a new element to the mapping if the symbol name has not been seen yet.

This way only the users of String#intern would pay the runtime price of the added flexibility, and since String#intern does not exist yet, no existing code would be affected.

The difference to what @mselig wrote above is that the “symbol table” would always be built, independently of the String#intern usage, because

  1. I believe this table must already be available in some form to ensure that the symbol->Int32 mapping is unique, and
  2. if it is not already available, one would not like to re-scan the entire code if String#intern is encountered in some deep require.

@straight-shoota: do you see any fundamental problems with this idea?


#9

There’s the problem that the symbol table could grow indefinitely, specially from a result of users passing malicious data, and crash your program. This affected Ruby in the past. Now they GC the symbols. If we go that route it will only get more and more complex.

As one of the original designers of the language I see no point in having symbols anymore. I’d like to remove them. Enums serve that purpose very well.

It doesn’t matter if people that come from Ruby have to do things in a different way. People are already migrating to languages much different than Ruby and Crystal already.


#10

Hi @noc, Yes, what you are saying is what I was suggesting. Though I am a bit confused where the method String#intern came from or what it is - wouldn’t it be better to be compatible with Ruby and be called to_sym?

Actually, it may be possible that this could be implemented with no compiler changes at all, if there is a way of enumerating the symbol table at runtime and determining which of the entries are actually Symbols (in the Crystal & Ruby class meaning). Currently I do not know the compiler well enough to know if this is possible or whether a special table needs to be built.

Other people have commented that a program could simply just use Strings in the first place (which requires minimal editing of existing Ruby code) without conversion to a Symbol. However there is a performance penalty in that each time a string is used as a hash table key, it has to be hashed. Also testing equality of such Strings is slower than equality of Symbols. Using a StringPool is another option to address these performance issues, but its usage as a constant is ugly (pool.get("str") vs :str) - you’d want to create a macro, I’d imagine. Anyhow, I think my suggestion for Symbols is clean and provides much better Ruby compatibility.


#11

Hi @mselig, IIRC Symbol#to_sym is just an alias for String#intern in Ruby.

As for the performance, I did a quick ad-hoc “benchmarking” by populating a hash with 2 million entries and then looking up a particular key repeatedly. The results are:

EDIT: forget it, wasn’t compiling crystal with --release, and with --release, the relevant code just gets evaluated at compile time.

EDIT 2: here the new “benchmark”, the CPU time used seems to scale with the number of loop iterations (here 10 million):

$ ./hash_speed_vs_key_type
Int32 populate: N=10000000 5.282001s == 1893221.9058648418/s 
Int32 lookup: N=10000000 0.162289s == 61618470.752792865/s 
String populate: N=10000000 10.443109s == 957569.244944202/s 
String lookup: N=10000000 0.391429s == 25547417.283849686/s   

So it seems that string keys entail a ca. 2.5x performance penalty, which is quite a lot.
Enums on the other hand are quite impractical, if not impossible to use if you don’t know beforehand what keys your hash might contain.

In addition to that, symbols are syntactically sooo much nicer than strings, even if they are abandoned as a separate type, I hope the syntax will remain valid and we will just get strings instead of symbols out of it.


#12

@asterite: OK, I get your point, between Enums and Strings, there is little place left for Symbols as a separate type.
But would that mean that the nice syntax of the symbols would also be gone, i.e. would
h={a:1, b:2} then necessarily become h={“a”=>1, “b”=>2}, or would the symbols syntax remain valid, but just produce (say) strings instead of symbols?


#13

Ruby isn’t the only language with Symbols, Scala has them too. Being a bytecode compiled language, Scala may have more in common with Crystal — I don’t know the semantics of symbols in scala.

@mselig I take your desire to make or keep Crystal more like Ruby to heart. I’ve been a ruby dev for a decade and I love so much about it. But Crystal is a very different language from ruby. Java is a very different language from C, though they also share a lot of syntax.

Crystals goal as a syntax is very similar to Ruby, to make a language very high level and easy to use. Crystal also has another core tennant: speed. I’m here for the speed. High level syntax is a major perk, and rubyish stdlib is too.

When you make a transition from one language to another, it takes time to adjust your thinking to become idiomatic. It is easy to write idiomatic Java but with Ruby characters – I’ve seen many new-to-Ruby coworkers do it. When you first come to Crystal, it’s easy to just write Ruby and watch that it works and is fast, but I submit that it is actually not Crystal you’re writing.

As I learned more and more Crystal, I relaxed my dependence on Symbols. They’re unnecessary at compile time, which is where I was using them most, building a DSL of some sort. (Just leave off the : and have no need of making a string) You May be using symbols in places you shouldn’t, even in Ruby. For me, String.to_sym is a strict bad practice in Ruby, but that comes from the past vulnerabilities.

JSON in a compiler language is a very different beast than in a dynamic one. Mappings are smooth, but extra keys and missing keys suddenly become typing problems. It’s easy to see why the previous attempt at APIs landed with XML and DTDs.

I guess my tldr is, I like symbols. And I don’t use them very often. While I would hate to see them go away, I don’t think it’ll really change what idiomatic Crystal becomes.


#14

I’d like to expand a bit on why I like symbols. My previous post comes off as being more anti symbol than I am.

Symbols and Enums are both examples of magic number programming. The benefit is that you don’t have to remember that a number corresponds to a state, you have names instead. High Level Programming.

Enums are SO much more verbose. MyModule::EnumName::Value vs :value

Enums imply ownership where it is unnecessary. Why should ModuleA::State::Success and ModuleB::State::Success be different? As a library user, it is far more readable to type :success in both cases. It’s more high level.

Symbols are not harder to convert to actual numbers than Enums, Enums just have a DSL. If you assign explicit numbers to your Enum (for maintainability), it’s just as verbose as making a symbol mapping method. Just not as pretty because no DSL has been provided yet.