Unihan number types and values

From: M.-A. Lemburg (mal@egenix.com)
Date: Mon Nov 29 2010 - 15:17:43 CST

  • Next message: John H. Jenkins: "Re: Unihan number types and values"

    Hello,

    in Python we have come across a possible inconsistency with respect
    to the way code points are classified as having numeric properties in
    the Unihan database.

    I'd like to get information on whether this is intentional or
    just a side-effect of the Unihan database using a different
    approach to number type classification than the UCD.

    In the UCD, the number type is defined as:

    http://www.unicode.org/reports/tr44/#Numeric_Type

    that is there are decimals (= decimal digits) which can be used to parse
    decimal radix digits; digits which represent decimal digits, but require
    special handling (e.g. superscript digits) and numeric types which can
    mean anything from single digits, to fractions and multi-digit numbers.

    In Unihan, the number code points are defined using:

    http://www.unicode.org/reports/tr44/#Numeric_Type_Han

    that is all code points with numeric representations are grouped
    in the numeric type category and there is an additional separation
    by accounting use, primary numeric and other numeric use.

    The typically used Chinese and Japanese code points for
    numeric digits fall into the Unihan range:

    http://www.wordiq.com/definition/Chinese_numerals
    http://en.wikipedia.org/wiki/Chinese_numerals

    Question: Why don't these code points have the "Nd" category ?

    See this list for the 5.2.0 group of Nd code points:

    http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt

    Related to this, it is also unclear what to use as official zero
    for these number systems (U+3007 is often recommended).

    Finally, unlike many of the other digit code point sequences
    in the UCD, there doesn't appear to be such a sequence for
    Chinese decimal digits (apart from the incomplete vertical variant
    U+3021 - U+3029, which lacks the zero).

    Thanks,

    -- 
    Marc-Andre Lemburg
    eGenix.com
    Professional Python Services directly from the Source  (#1, Nov 29 2010)
    >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
    >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
    >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
    ________________________________________________________________________
    ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
       eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
        D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
               Registered at Amtsgericht Duesseldorf: HRB 46611
                   http://www.egenix.com/company/contact/
    


    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 15:30:14 CST