Unicode as syntax

Hi,

I mentioned this before in some places here, that restricting us to ASCII only in the year 2023 is maybe a bit old fashioned and language syntax could be improved by embracing unicode.

Well, apparently someone did that for Haskell: Writing prettier Haskell with Unicode Syntax and Vim

I love it, and thought I just leave it here in Off Topic, as a curio or food-for-thought.

Iā€™m pretty sure this is already the case? Mainly given this code works:

def šŸ
  pp "foo"
end
 
šŸ # => "foo"

The linked article talks about using Unicode in language syntax, such as keyswords. For example, āˆ€ instead of forall.

Sure that works. But why?
Itā€™s essentially an alias, so just another way to do the same thing. Crystal tries to avoid that to remove the burdon of having to learn two things when one does suffice.

2 Likes

ā€œ26 Characters ought to be enough for anybody.ā€ - Bill Gates, probably.
I didnā€™t post this because I want crystal to change to using unicode. Itā€™s open source after all, and as you pointed out, i could just alias those things.

I just wanted drop the principle in as many peoples minds as possible (without being spammy I hope), just because I feel the concept is valuable.
Maybe one person sees this and in 4 years develops an incredible thing, using the seeds in this pattern.

Also, on the topic of having to learn 2 things, why is it Array.sum() and not Array.āˆ‘() ?
Again, I donā€™t want to change crystal, this should be posted under ā€œoff topicā€ (unless i did something wrong).

I just want to plant seeds. in heads. (without it being gross)

For me personally, not having an easy way to access those characters without operating system specific shortcuts (if they even exist) prevents me from wanting to use them in a programming language.

6 Likes

Yes, i guess there not a easy to input those character in Linux, not all Crystalist use mac.

1 Like

Either a language uses Unicode names extensively (APL etc.) or does not use them at all. Anything in between sounds like only doing it for the sake of looking ā€œmodernā€.

If #āˆ‘ is merely an alias of #sum, the vast majority of the people will continue to use the latter. The only way to make sense of the former is to have a standard library where the written names all vanish, or are so rare that they become seen as aliases to symbolic names. Such an approach is inherently incompatible with prior efforts.

3 Likes

I could see using this (after looking at the Haskel link) as a domain specific language if all the developers are in the domain. For a general purpose language, give me simple textual words that anyone familiar with programming is going to understand.

1 Like

:thinking: APL?

I suppose you could make a shard full of aliases or macros that implement this sort of thing, then itā€™d be optional for developers. ā€œOptionalā€ would be key here, imo.

Well, we do have AltGr. I use that to do a lot of special inputs like Ʊ or Ā± or Ʀ or ā„¢, but I donā€™t believe most distros are configured out-of-the-box to use a layout with AltGr. And it still wonā€™t cover most of the symbols weā€™d see in a programming context.

1 Like

Another reason: inclusion.

Not having a mathematical background, āˆ‘ means nothing to me.

4 Likes

I would challenge that argument.

The word sum only has meaning to English speakers. For others it doesnā€™t mean anything.
But āˆ‘ is a universal sign for sum. So Iā€™d argue itā€™s technically more inclusive because it doesnā€™t require knowledge of a specific language.

1 Like

A few counterpoints for your consideration:

While math is universal, our language for it isnā€™t. Itā€™s shared across many languages, Iā€™m a bit skeptic about its universality outside the influence of ā€œwestern civilizationā€. For instance, has China just adopted our math notation, or do they have their own?

sum indeed requires knowledge of English, but so does the rest of Crystal (as pretty much all programming languages does these days). FWIW, sum is exactly the same in Danish, and I suspect a few others.

Besides, that I managed almost half a century on this earth, without being aware of the meaning of the sigma sign, is a point in itself. Iā€™d wager that thereā€™s fewer people that understand āˆ‘ but not sum, than the reverse.

3 Likes

Thereā€™s a different take: use an editor plugin to prettify the look of the code. For instance, in Emacsā€™ company-coq, every time I write forall, the editor shows āˆ€. I think this nicely combines the look of ā€œprettyā€ code (for some peopleā€™s definition), at the same time of making the code easily transferrable to others.

5 Likes

Keep the actual code and just give it a different visual representation in the editor.
Very interesting, portable, and with minimal impact on other people working on the code base. nice!

2 Likes

A form of this already available by enabling font ligatures. For example, in my editor I type in != but I see ā‰ ; I use Fira Code Mono.

Someoneā€™s already asked a question about doing this with VSCode (here in Stack Overflow).

1 Like

Hmm. I have to read up on fonts.
Do you happen to know if ligatures are a font feature that needs to be supported by whtatever is displaying the characters?
Or is it a thing that is automatically supported by truetype font handling?

Itā€™s a bit of both.

  1. Truetype supports the concept of ligatures, but the font itself needs to define them.
  2. An editor has to support ligatures when rendering text and itā€™s font engine should support it; then it relies on the data in the font to identify when to use them regardless of the font itself.

VSCode supports ligatures, you just have to enable it in settings; not sure about other editors, but I assume most modern editors/IDEs support it as well. So you only have to solve #1, which is to modify a font so to define custom ligatures. The post in Stack Overflow provides a link to the open source font that you could fork and tweak:

A popular example of this is Fira Code, which is a modified version of the OFL-licensed Fira Mono, but with the ligature glyphs drawn specifically for the project.

There is a script for using those glyphs automatically in other fonts and generating the feature code, for fonts where the license allows modifications: GitHub - ToxicFrog/Ligaturizer: Programming Fonts with Ligatures added (& a script to add them to other fonts)

1 Like

Thank you for the pointer in this direction, i will certainly try this out!