RE: Hexadecimal digits?

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Nov 11 2003 - 07:02:23 EST

  • Next message: Jill Ramonsky: "RE: Hexadecimal digits?"

    (long argument deleted)

    If you are suggesting that the natural sort algorithm won't work without
    separate codepoints for hex digits then you are of course correct, but
    that is an argument in favor of hex-digit-characters, not against them.

    Ordering natural numbers (whole numbers >= 0)) expressed as numerals,
    usually sequences of digits, can be made to work for any base as long as
    one can write the digits in a convenient way. (That does NOT mean digit
    clones of A-Z.).
     
    If you like you can lobby OS/UI makers (or sort order implementation
    providers in general) to supply a "hackers's option" where A-F and a-f
    are regarded as digits (possibly with some heuristic to determine which
    As are hex digits and which are not). I would have that "off" by default
    though; most users would not find hexadecimal very uncomfortable, and
    indeed surprising. They would be even more surprised to find some
    As not sort like other As (if there were such clones), looking just the
    same. Note that all the existing clones of A-Z and a-z are ordered just
    like the ordinary letters in the default order of the UCA (and the CTT
    of 14651). Likewise the roman number compatibility characters are
    ordered as the letters that constitute them; not in any numeric order.

     The natural sort algorithm works identically in all radices. There is
    nothing special about radix ten. Furthermore, the same sort order is
    guaranteed in all radices. An implementation of a natural sort algorithm
    does NOT need to "know" the radix. It does not need to guess. It does
    not need to assume. It does not need to infer. It does not even need to
    care. All it needs are the functions IsDigit(codepoint) and
    GetDigitValue(codepoint). The return value of the latter is only
    required to be defined if the return value of the former is true. That's
    ALL it needs.

    That's one way of doing it. Another is to prehandle the string, as
    explained in annex C.3 of ISO/IEC 14651, and use suitable weighting
    for the characters used in the numerals, and then just apply the
    ordinary
    collation key calculation (by demand or complete) and compare the
    strings as "usual" (for 14651 or UCA comparisons). Incidentally, that
    annex also considers negative numerals, and numerals with a fraction
    part. It only considers decimal base in the examples, but there is no
    problem in generalising to other integer bases >= 2, just as long as you
    have enough characters to express the digits (which could in principle
    be
    expressed with multiple characters each, even a varying number).
    (If you use a base greater than decimal, then your right that decimal
    numerals orders in the expected way, having done the prehandling,
    as long as you stick to decimal digits in the actual strings.)
     
            /kent k
     





    This archive was generated by hypermail 2.1.5 : Tue Nov 11 2003 - 07:50:49 EST