L2/03-043
Source: Mark Davis
Date: Feb 11, 2003
Title: Decimal Discrepancy
We have a couple of different ways that numbers are categoried in the
UnicodeData file:
1. General_Category: Decimal_Digit_Number (Nd), Letter Number (Nl), and
Other_Number (No).
2. Numeric_Type: Decimal_Digit, Digit, and Numeric (based on the values in
fields 6-8.)
Now, someone would expect that Decimal_Digit_Number and Decimal_Digit would
be be the same set of characters. But they are not. The extracted data file
shows the discrepancies:
http://www.unicode.org/Public/3.2-Update/extracted/DerivedNumericType-3.2.0.
txt
In particular, 10 Decimal_Digit characters (out of the 268) are not
Decimal_Digit_Number:
2070 ; decimal # No SUPERSCRIPT ZERO
00B9 ; decimal # No SUPERSCRIPT ONE
00B2..00B3 ; decimal # No [2] SUPERSCRIPT TWO..THREE
2074..2079 ; decimal # No [6] SUPERSCRIPT FOUR..NINE
I don't recall there being any principle behind this result; I suspect it
was just an oversight. I'd recommend that either we change the
General_Category for these characters to Decimal_Digit_Number, or change the
Numeric_Type to Digit
Mark