Re: Processing Digit Variants

From: Philippe Verdy <>
Date: Thu, 21 Mar 2013 22:51:24 +0100

2013/3/20 Richard Wordingham <>:
> On Wed, 20 Mar 2013 07:34:38 -1000
> Markus Scherer <> wrote:
>> More processing in the bowels
>> of the collation code would be very complicated, and ambiguous:
>> "file-5.txt" is probably file number 5 rather than file minus five.
> And "section-2.3.txt" should probably be sorted before
> "section-2.12.txt"!

This also demonstrates that the generic ASCII full dot found in
identifiers like filenames should not be interpreted as a decimal
point in identifier collators.
If one wnats to sort filenames where the dot should be interpreted
unambeguously as a decimal point, that decimal point should be encoded
distinctly (just like the hyphen-minus beng replaced by the
mathematical minus sign).

Identifier collators have to follow different rules that generic
collators that parse normal texts really written in humane languages :
they can only process unsigned integers and nothing else (excluding
also the grouping separators, so a file named "item 1,000.txt" will
sort **before** "item 2.txt", but "item 1000.txt" will sort after
"item 2.txt", there's no clear way about how to encode commas,
breaking or non-breaking whitespaces, dots, or apostrophes which may
be used as grouping separators, not even any standardized sequence
encoded with these punctuations or whitespaces plus a semantic
variation selector).
Received on Thu Mar 21 2013 - 16:56:25 CDT

This archive was generated by hypermail 2.2.0 : Thu Mar 21 2013 - 16:56:26 CDT