L2/03-043 Source: Mark Davis Date: Feb 11, 2003 Title: Decimal Discrepancy We have a couple of different ways that numbers are categoried in the UnicodeData file: 1. General_Category: Decimal_Digit_Number (Nd), Letter Number (Nl), and Other_Number (No). 2. Numeric_Type: Decimal_Digit, Digit, and Numeric (based on the values in fields 6-8.) Now, someone would expect that Decimal_Digit_Number and Decimal_Digit would be be the same set of characters. But they are not. The extracted data file shows the discrepancies: http://www.unicode.org/Public/3.2-Update/extracted/DerivedNumericType-3.2.0. txt In particular, 10 Decimal_Digit characters (out of the 268) are not Decimal_Digit_Number: 2070 ; decimal # No SUPERSCRIPT ZERO 00B9 ; decimal # No SUPERSCRIPT ONE 00B2..00B3 ; decimal # No [2] SUPERSCRIPT TWO..THREE 2074..2079 ; decimal # No [6] SUPERSCRIPT FOUR..NINE I don't recall there being any principle behind this result; I suspect it was just an oversight. I'd recommend that either we change the General_Category for these characters to Decimal_Digit_Number, or change the Numeric_Type to Digit Mark