Re: numeric properties of Nl characters in the UCD

From: Doug Ewell (dewell@adelphia.net)
Date: Thu Nov 27 2003 - 13:46:18 EST

  • Next message: Jim Allan: "Re: Decimal digit property - What's it for?"

    Arcane Jill wrote (in rich text):

    > The review on Ethiopic and Tamil non-decimal digits is interesting,
    > but I can't help but feel it was a culturally biased decision (read:
    > mistake) to EVER have had a "radix ten" property without any similar
    > property for any other radix, thereby forcing non-decimal digits to
    > end up being classified as No (Other_Number) instead of Nd
    > (Number_Decimal).

    I think the charge of cultural bias is overstated. The *vast* majority
    of cultures on Earth use a base-10 positional system.

    On the contrary, having a property that associates all the various DIGIT
    NINEs, from Latin to Arabic-Indic to Oriya to Lao to Ethiopic to Limbu,
    with the numeric value 9 shows a distinct absence of bias toward a
    particular culture.

    As Mark said, the minority of cultures that use a number system other
    than base-10 positional can still have their numbers represented in
    Unicode, but software that wishes to interpret numeric values using such
    a system must handle it specially. Not every number system can be
    neatly encapsulated in UnicodeData.txt.

    > It's a mistake because, even in my culture, digit one followed by
    > digit two is not always interpretted as the number twelve. Phone
    > numbers and PINs are one exception. Version numbers such as "version
    > 12.12.12" are another exception.

    Those aren't numbers. Ha ha! Surprised? They are *character strings*
    that happen to consist (mostly) of digits.

    There is *no inherent numeric value* to a phone number, a PIN, a U.S.
    ZIP code, or a credit card number. They are just identifiers. You
    cannot get any meaningful result by performing arithmetic operations on
    them, unless of course you care who was assigned a PIN immediately
    before you (but even then, you could do the same thing with alphabetic
    identifiers). Phone number assignment is anything but sequential.

    In fact, interpreting such an identifier strictly as a number can lead
    to problems if leading zeros are dropped. The Boston ZIP code "02101"
    cannot be correctly rendered as "2101" even though that is numerically
    equivalent.

    In a software version number such as "12.12.12", the individual twelves
    are decimal numbers, but the full stop between them is not a normal
    decimal point. Each component "12" is a number, but the complete
    eight-character string is not.

    In my culture at least, we even misuse this term "number" to refer to
    ALPHANUMERIC identifiers. For example, we speak of the "serial number"
    on a dollar bill, a "driver's license number," or a "license plate
    number." All of these typically contain one or more letters.

    People always look at me like I'm crazy when I say these aren't numbers,
    but they aren't.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 14:42:22 EST