String#<=>
uses unsafe.memcmp
, so it’s a numeric byte sort rather than anything language-aware. Sorting of UTF-8 codepoints for a particular language would require a table of character-order for that language, so that "ä"
and "a"
are in proper order relative to each other. A table of characters to ignore in sorting, like "'"
, is also necessary.
Has anyone done this for Crystal? There is a treatise on Unicode sorting at UTS #10: Unicode Collation Algorithm that is a mullti-level sort with weights, a lot more than just two tables, but I don’t know of an implementation.
Thanks
Bruce