From: M.-A. Lemburg (firstname.lastname@example.org)
Date: Mon Nov 29 2010 - 15:17:43 CST
in Python we have come across a possible inconsistency with respect
to the way code points are classified as having numeric properties in
the Unihan database.
I'd like to get information on whether this is intentional or
just a side-effect of the Unihan database using a different
approach to number type classification than the UCD.
In the UCD, the number type is defined as:
that is there are decimals (= decimal digits) which can be used to parse
decimal radix digits; digits which represent decimal digits, but require
special handling (e.g. superscript digits) and numeric types which can
mean anything from single digits, to fractions and multi-digit numbers.
In Unihan, the number code points are defined using:
that is all code points with numeric representations are grouped
in the numeric type category and there is an additional separation
by accounting use, primary numeric and other numeric use.
The typically used Chinese and Japanese code points for
numeric digits fall into the Unihan range:
Question: Why don't these code points have the "Nd" category ?
See this list for the 5.2.0 group of Nd code points:
Related to this, it is also unclear what to use as official zero
for these number systems (U+3007 is often recommended).
Finally, unlike many of the other digit code point sequences
in the UCD, there doesn't appear to be such a sequence for
Chinese decimal digits (apart from the incomplete vertical variant
U+3021 - U+3029, which lacks the zero).
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 29 2010) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/
This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 15:30:14 CST