From: Richard Wordingham <richard.wordingham_at_ntlworld.com>

Date: Thu, 12 Jul 2012 01:41:33 +0100

What is a number having a numeric type of "digit" meant to convey?

The old Unicode 2.0 definition definition of "digit value" seemed clear:

"Digit value. This is a numeric field. If the character represents a

digit, not necessarily a decimal digit, the value is here. This covers

digits which do not form decimal radix forms, such as the compatibility

superscript digits. This field is informative."

That definition seems to be gone. From what we now have, I can think

of several meanings, e.g.:

1) It's a digit in a system of decimal place notation, but doesn't quite

qualify for some reason. Typical examples are:

a) U+19DA NEW TAI LUE THAM DIGIT ONE - cruelly denied "decimal" status

because it wasn't assigned with 9 clones of the other Tai Lue digits.

b) U+2070 SUPERSCRIPT ZERO - not in a contiguous range, and

apparently possibility of misleading parsers

c) U+2080 SUBSCRIPT ZERO - apparent possibility of misleading parsers

2) It's in a decimal system, but not with place notation:

a) U+10E60 RUMI DIGIT ONE. By contrast U+10E69 RUMI NUMBER

TEN is a mere "numeric" - possibly because numeric field values of

blank for decimal digit value (field 6 in UnicodeData.txt), 1 as the

digit value (field 7) and 10 as the value (field 8) would be too

confusing, as well as contrary to the current rules.

On the other hand, I don't see why, apart from a general disapproval of

compatibility characters, the Roman numerals U+2170 SMALL ROMAN

NUMERAL ONE to U+2178 SMALL ROMAN NUMERAL NINE don't count as digits.

3) It's derived from a decimal digit, e.g. U+2468 CIRCLED DIGIT

NINE is "digit", whereas the next in the series, U+2469 CIRCLED NUMBER

TEN, just has a numeric type of "numeric".

---- It's not clear to me why the following decimal digits (in the normal, not the Unicode sense) are not classified as "digit" but just as numeric U+1D360 COUNTING ROD UNIT DIGIT ONE U+3021 HANGZHOU NUMERAL ONE The only reason for U+1D369 COUNTING ROD TENS DIGIT ONE not to be a digit that I can think of is that the system is conceived of as a centesimal system. The counting rods 'UNIT' and 'TENS' digits are used alternatively to avoid misreading, with various methods for indicating zero. Likewise, why are U+0C79 TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF FOUR and related characters not digits? Is it because they are a base 4 (or collectively hexadecimal) system? Perhaps some light can be shed on the system by learning what people actually use the numeric types and (decimal) digit values for. Richard.Received on Wed Jul 11 2012 - 19:46:13 CDT

