I want to print the Unicode NAMES related to whitespace and control characters (ASCII code < 14) e.g ‘SPACE’, ‘NO-BREAK SPACE’, HORIZONTAL TAB, etc.
non-working Python code :-)
import unicodedata, string
for e in string.whitespace + unicodedata.lookup(“GREEK SMALL LETTER ALPHA”):
print(ord(e))
print(unicodedata.name(e))
this Python works - but I am unable to print NAMES:
str_whitespace = string.whitespace
print(ascii(str_whitespace)) # ’ \t\n\r\x0b\x0c’
print(str_whitespace.encode()) # b’ \t\n\r\x0b\x0c’
How to make it work in Crystal programming language?
Good question. As far as I know this information isn’t exposed/available anywhere at the moment. It looks like we’re capturing the data in the script that generates the Unicode data but aren’t using it to power any sort of API or something.
Probably would be doable to expose that, if it’s something that would be wanted in the stdlib. Otherwise, could be a good idea for a shard.
It seems that that other languages are not exposing it either. There must be a good reason why… I am just curious.
Probably there is not a good reason to have it in Crystal either.
As a side note: how does Crystal test if some characters are not printable - e.g. in Python isPrintable() .
I presume the main reason not to have this is because it’s not very frequently used.
Apparently you can figure this out by checking that the category of the character is not Cc
or Cn
as all characters not in those categories are printable. However I can’t seem to find a way to know what a character’s category is. But it would probably be enough to just use like !char.control?
and call it a day given there are no characters in the Cn
category at the moment .