**From:** Kent Karlsson (*kentk@cs.chalmers.se*)

**Date:** Tue Nov 11 2003 - 07:02:23 EST

**Previous message:**Jill Ramonsky: "RE: Hexadecimal digits?"**In reply to:**Jill Ramonsky: "RE: Hexadecimal digits?"**Next in thread:**Jill Ramonsky: "RE: Hexadecimal digits?"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

(long argument deleted)

If you are suggesting that the natural sort algorithm won't work without

separate codepoints for hex digits then you are of course correct, but

that is an argument in favor of hex-digit-characters, not against them.

Ordering natural numbers (whole numbers >= 0)) expressed as numerals,

usually sequences of digits, can be made to work for any base as long as

one can write the digits in a convenient way. (That does NOT mean digit

clones of A-Z.).

If you like you can lobby OS/UI makers (or sort order implementation

providers in general) to supply a "hackers's option" where A-F and a-f

are regarded as digits (possibly with some heuristic to determine which

As are hex digits and which are not). I would have that "off" by default

though; most users would not find hexadecimal very uncomfortable, and

indeed surprising. They would be even more surprised to find some

As not sort like other As (if there were such clones), looking just the

same. Note that all the existing clones of A-Z and a-z are ordered just

like the ordinary letters in the default order of the UCA (and the CTT

of 14651). Likewise the roman number compatibility characters are

ordered as the letters that constitute them; not in any numeric order.

The natural sort algorithm works identically in all radices. There is

nothing special about radix ten. Furthermore, the same sort order is

guaranteed in all radices. An implementation of a natural sort algorithm

does NOT need to "know" the radix. It does not need to guess. It does

not need to assume. It does not need to infer. It does not even need to

care. All it needs are the functions IsDigit(codepoint) and

GetDigitValue(codepoint). The return value of the latter is only

required to be defined if the return value of the former is true. That's

ALL it needs.

That's one way of doing it. Another is to prehandle the string, as

explained in annex C.3 of ISO/IEC 14651, and use suitable weighting

for the characters used in the numerals, and then just apply the

ordinary

collation key calculation (by demand or complete) and compare the

strings as "usual" (for 14651 or UCA comparisons). Incidentally, that

annex also considers negative numerals, and numerals with a fraction

part. It only considers decimal base in the examples, but there is no

problem in generalising to other integer bases >= 2, just as long as you

have enough characters to express the digits (which could in principle

be

expressed with multiple characters each, even a varying number).

(If you use a base greater than decimal, then your right that decimal

numerals orders in the expected way, having done the prehandling,

as long as you stick to decimal digits in the actual strings.)

/kent k

- application/x-pkcs7-signature attachment: smime.p7s

**Next message:**Jill Ramonsky: "RE: Hexadecimal digits?"**Previous message:**Jill Ramonsky: "RE: Hexadecimal digits?"**In reply to:**Jill Ramonsky: "RE: Hexadecimal digits?"**Next in thread:**Jill Ramonsky: "RE: Hexadecimal digits?"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

*
This archive was generated by hypermail 2.1.5
: Tue Nov 11 2003 - 07:50:49 EST
*