From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 29 2010 - 16:24:30 CST
Marc-Andre Lemburg asked:
> Question: Why don't these code points have the "Nd" category ?
Because the General_Category=Nd value (and Numeric_type=Decimal)
is explicitly limited to ordinary decimal digits that are used in
decimal radix expressions *and* which are encoded in a contiguous
sequence 0..9. See the character encoding stability policies
for the recent expression of this constraint:
http://www.unicode.org/policies/stability_policy.html#Property_Stability
The Han numeric ideographs fail the latter test. And it
would be inadvisable to process them as gc=Nd anyway, because
they are quite often used in traditional numbering in
East Asia, which does not use decimal radix forms. Handling
Han numeric ideographs requires special processing to
parse numeric values correctly.
> Related to this, it is also unclear what to use as official zero
> for these number systems (U+3007 is often recommended).
In addition to John Jenkin's clarification, I would point out
that when Han ideographs *are* used in decimal radix
expressions, the usual choice for a zero *digit* is U+3007.
U+96F6 expresses the *concept* of zero. In other words,
it is more akin to "zero" than to "0", and would seldom
be seen used in numerical expressions.
A postscript about the Numeric_Value and Numeric_Type properties:
Both are derived by using values both from UnicodeData.txt and
numeric tags from the Unihan Database. The are not "simple properties"
by the meaning of the D45 definition in Section 3.5, Properties
of the Unicode Standard. See the end of Section 5.4, Derived
Extracted Properties in UAX #44 for the best current statement
of how they are actually derived.
--Ken
This archive was generated by hypermail 2.1.5 : Mon Nov 29 2010 - 16:25:58 CST