From: Philippe Verdy <verdy_p_at_wanadoo.fr>

Date: Wed, 20 Mar 2013 21:05:35 +0100

Date: Wed, 20 Mar 2013 21:05:35 +0100

2013/3/20 Markus Scherer <markus.icu_at_gmail.com>:

*> Numeric collation is actually much more limited than number parsing, to
*

*> strictly strings of digits, not including sign (thus only non-negative),
*

*> decimal, exponent, etc. More processing in the bowels of the collation code
*

*> would be very complicated, and ambiguous: "file-5.txt" is probably file
*

*> number 5 rather than file minus five.
*

File names are identifiers, they are not real phulane language, thye

don't obey to any grammatical rule from any language, even if they may

be named according to some convention in a given language, but they

are frequently abbreviated and use a reduced set of characters.

So collation parsing of numbers for sorting filenames is in fact

collation parsing in technical identifiers. It would be different if

performing collation in a true text like a book, or even in OCR'd

facsimile of accounting reports, when preparing them to rebuild a

spreadsheet.

Imagine toy import a list of filenames in a spreadsheet, the column

type would be set as "text", not numbers. In such cases, sorting as

"text" should use the sort options appropriate for sorting

identifiers. Numbers imported in a "number" column should convert any

number, accepting signs, exponent notations, and correctly filtering

out control formats ot compute the effective value.

So for converting formatted numbers to effective numeric values, the

lenient parsing should be used (numbers will then not sort using

collation, but using their effective numeric value after this

operation).

If the lenient parsing of numbers fails, the column in the spreadsheet

will be trated as "text" and will sort with collation but with a

reduced supported format for numbers (so effectively the ambiguous

ASCII hyphen-minus will be treated as a.hyphen punctuation, not as a

minus sign.

If filenames have to be sorted according to the represented numeric

value, the ambiguous ASCII hyphen-minus should not be used, ans the

mathematical MINUS character should be used in their name (and it

shoul dremain interpreted as a sign in the more restrictive collation

parsing of numbers in identifiers).

.

Received on Wed Mar 20 2013 - 15:08:06 CDT

*
This archive was generated by hypermail 2.2.0
: Wed Mar 20 2013 - 15:08:06 CDT
*