I want to print the Unicode NAMES related to whitespace (ASCII code < 14) e.g 'SPACE', HORIZONTAL TAB, etc

masiarek · June 27, 2021, 7:59pm

I want to print the Unicode NAMES related to whitespace and control characters (ASCII code < 14) e.g ‘SPACE’, ‘NO-BREAK SPACE’, HORIZONTAL TAB, etc.

non-working Python code :-)

import unicodedata, string
for e in string.whitespace + unicodedata.lookup(“GREEK SMALL LETTER ALPHA”):
print(ord(e))
print(unicodedata.name(e))

this Python works - but I am unable to print NAMES:

str_whitespace = string.whitespace
print(ascii(str_whitespace)) # ’ \t\n\r\x0b\x0c’
print(str_whitespace.encode()) # b’ \t\n\r\x0b\x0c’

How to make it work in Crystal programming language?

Blacksmoke16 · June 27, 2021, 8:17pm

Good question. As far as I know this information isn’t exposed/available anywhere at the moment. It looks like we’re capturing the data in the script that generates the Unicode data but aren’t using it to power any sort of API or something.

Probably would be doable to expose that, if it’s something that would be wanted in the stdlib. Otherwise, could be a good idea for a shard.

masiarek · June 27, 2021, 8:23pm

It seems that that other languages are not exposing it either. There must be a good reason why… I am just curious.

Probably there is not a good reason to have it in Crystal either.

As a side note: how does Crystal test if some characters are not printable - e.g. in Python isPrintable() .

straight-shoota · June 27, 2021, 8:39pm

I presume the main reason not to have this is because it’s not very frequently used.

masiarek · June 27, 2021, 8:41pm

I found helpful link: Unicode Mail List Archive: Re: Unicode 1.0 names for control ch

Blacksmoke16 · June 27, 2021, 8:45pm

Apparently you can figure this out by checking that the category of the character is not Cc or Cn as all characters not in those categories are printable. However I can’t seem to find a way to know what a character’s category is. But it would probably be enough to just use like !char.control? and call it a day given there are no characters in the Cn category at the moment .

masiarek · June 28, 2021, 12:42pm

Here is a pointer in Python language (printing the names of unicode characters ): python - Printing unicode character NAMES - e.g. 'GREEK SMALL LETTER ALPHA' - instead of 'α' - Stack Overflow

Topic		Replies	Views
Do ascii/binary strings exist? Help & Support	25	514	March 21, 2022
Crystal 1.15.0 is released! Website	4	397	January 19, 2025
Using non alphanumeric characters in keys of NamedTuples Help & Support	3	250	July 10, 2021
Few questions about cli with crystal Help & Support	19	693	September 22, 2023
LLDB : how to print string, and pretty print instances Help & Support	0	167	January 23, 2024

I want to print the Unicode NAMES related to whitespace (ASCII code < 14) e.g 'SPACE', HORIZONTAL TAB, etc

non-working Python code :-)

this Python works - but I am unable to print NAMES:

How to make it work in Crystal programming language?

Related topics