Re: Unihan number types and values

From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Nov 29 2010 - 16:14:23 CST

  • Next message: Kenneth Whistler: "Re: Unihan number types and values"

    The numeric types, including Han, are supplied in:

    http://unicode.org/Public/6.0.0/ucd/extracted/DerivedNumericType.txt
    http://unicode.org/Public/6.0.0/ucd/extracted/DerivedNumericValues.txt

    Mark

    *— Il meglio è l’inimico del bene —*

    On Mon, Nov 29, 2010 at 13:17, M.-A. Lemburg <mal@egenix.com> wrote:

    > Hello,
    >
    > in Python we have come across a possible inconsistency with respect
    > to the way code points are classified as having numeric properties in
    > the Unihan database.
    >
    > I'd like to get information on whether this is intentional or
    > just a side-effect of the Unihan database using a different
    > approach to number type classification than the UCD.
    >
    > In the UCD, the number type is defined as:
    >
    > http://www.unicode.org/reports/tr44/#Numeric_Type
    >
    > that is there are decimals (= decimal digits) which can be used to parse
    > decimal radix digits; digits which represent decimal digits, but require
    > special handling (e.g. superscript digits) and numeric types which can
    > mean anything from single digits, to fractions and multi-digit numbers.
    >
    > In Unihan, the number code points are defined using:
    >
    > http://www.unicode.org/reports/tr44/#Numeric_Type_Han
    >
    > that is all code points with numeric representations are grouped
    > in the numeric type category and there is an additional separation
    > by accounting use, primary numeric and other numeric use.
    >
    > The typically used Chinese and Japanese code points for
    > numeric digits fall into the Unihan range:
    >
    > http://www.wordiq.com/definition/Chinese_numerals
    > http://en.wikipedia.org/wiki/Chinese_numerals
    >
    > Question: Why don't these code points have the "Nd" category ?
    >
    > See this list for the 5.2.0 group of Nd code points:
    >
    > http://www.unicode.org/Public/5.2.0/ucd/extracted/DerivedNumericType.txt
    >
    > Related to this, it is also unclear what to use as official zero
    > for these number systems (U+3007 is often recommended).
    >
    > Finally, unlike many of the other digit code point sequences
    > in the UCD, there doesn't appear to be such a sequence for
    > Chinese decimal digits (apart from the incomplete vertical variant
    > U+3021 - U+3029, which lacks the zero).
    >
    > Thanks,
    > --
    > Marc-Andre Lemburg
    > eGenix.com
    >
    > Professional Python Services directly from the Source (#1, Nov 29 2010)
    > >>> Python/Zope Consulting and Support ... http://www.egenix.com/
    > >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
    > >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
    > ________________________________________________________________________
    >
    > ::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
    >
    >
    > eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
    > D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
    > Registered at Amtsgericht Duesseldorf: HRB 46611
    > http://www.egenix.com/company/contact/
    >
    >



    This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 16:16:40 CST