I want to print the Unicode NAMES related to whitespace (ASCII code < 14) e.g 'SPACE', HORIZONTAL TAB, etc

I want to print the Unicode NAMES related to whitespace and control characters (ASCII code < 14) e.g ‘SPACE’, ‘NO-BREAK SPACE’, HORIZONTAL TAB, etc.

non-working Python code :-)

import unicodedata, string
for e in string.whitespace + unicodedata.lookup(“GREEK SMALL LETTER ALPHA”):
print(ord(e))
print(unicodedata.name(e))

this Python works - but I am unable to print NAMES:

str_whitespace = string.whitespace
print(ascii(str_whitespace)) # ’ \t\n\r\x0b\x0c’
print(str_whitespace.encode()) # b’ \t\n\r\x0b\x0c’

How to make it work in Crystal programming language?

Good question. As far as I know this information isn’t exposed/available anywhere at the moment. It looks like we’re capturing the data in the script that generates the Unicode data but aren’t using it to power any sort of API or something.

Probably would be doable to expose that, if it’s something that would be wanted in the stdlib. Otherwise, could be a good idea for a shard.

It seems that that other languages are not exposing it either. There must be a good reason why… I am just curious.

Probably there is not a good reason to have it in Crystal either.

As a side note: how does Crystal test if some characters are not printable - e.g. in Python isPrintable() .

I presume the main reason not to have this is because it’s not very frequently used.

I found helpful link: Unicode Mail List Archive: Re: Unicode 1.0 names for control ch

Apparently you can figure this out by checking that the category of the character is not Cc or Cn as all characters not in those categories are printable. However I can’t seem to find a way to know what a character’s category is. But it would probably be enough to just use like !char.control? and call it a day given there are no characters in the Cn category at the moment :man_shrugging:.

Here is a pointer in Python language (printing the names of unicode characters ): python - Printing unicode character NAMES - e.g. 'GREEK SMALL LETTER ALPHA' - instead of 'α' - Stack Overflow