Re: Decimal digit property - What's it for?

From: Jim Allan (jallan@smrtytrek.com)
Date: Thu Nov 27 2003 - 13:56:24 EST

  • Next message: Doug Ewell: "Re: Compression through normalization"

    Arcane Jill wrote:

    > It has been explained to me that the "decimal digit" property has the
    > following meaning: "Decimal numbers are those using in decimal-radix
    > number systems. In particular, the sequence of the ONE character
    > followed by the TWO character is interpreted as having the value of
    > twelve".

    I don't agree with that explanation.

    If I use isdigit() in c or a corresponding function in another language
    to check a character, I only expect to find out whether to not that
    character is or is not a decimal digit. I won't know whether it is being
    used as part of a decimal-radix number or not.

    > I mean, it's quite clearly ignored in sentences like "My phone number is
    > 0044-1727-6000000", or "The codepoint of the space character is U+0020".
    >
    Many languages and applications allow use of a filter template such as
    "9999-999-9999999" or "####-####-#######" in which the figure "9" or "#"
    in the template must be filled by a decimal digit in the data.

    Allowing *only* decimal digits (and additional template characters) in a
    field is often useful. Keycodes and product codes often contain
    particular positions that must be alphabetic and other positions where
    only decimal numbers are allowed.

    > What possible use could any mechanical algorithm make of the "decimal
    > digit" property that it could not equally well make of the "digit" or
    > "numeric" properties?

    We hardly want to allow Roman numeral characters in a field that we are
    going to evaluate as though it were decimal. If we are interpreting a
    field as a radix 10 number it is reasonable to validate the field as
    containing only radix 10 characters (and allowed numeric separators)
    proceeded or followed by spaces.

    Generally a check on whether a character is a decimal digit is part of
    validation, whether validation of previously stored data or of data as
    it is being input. Of course we will probably normally want a tighter
    validation. We probably won't want to allow a number that is composed of
    mixed Latin, Arabic and Hindu digits even though it can be evaluated.

    On the other hand, in a multi-lingual and multi-script environment it
    would be useful to ignore scripts in evaluating numbers just as one
    often ignores case in evaluating strings. Note that data is often
    supplied from a client in text format, say tab-delimited, with numbers
    in text format. It would be useful to verify such data by checking that
    the numbers are proper decimal numbers regardless of script before
    actually reading the data into another database where they might (or
    might not) be converted to binary format.

    Checking for decimal numbers is also useful in parsing addresses which
    is a necessity for address validation and address correction software.

    Jim Allan



    This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 15:28:13 EST