Meaning of Numeric Type "digit"

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 12 Jul 2012 01:41:33 +0100

What is a number having a numeric type of "digit" meant to convey?

The old Unicode 2.0 definition definition of "digit value" seemed clear:
"Digit value. This is a numeric field. If the character represents a
digit, not necessarily a decimal digit, the value is here. This covers
digits which do not form decimal radix forms, such as the compatibility
superscript digits. This field is informative."

That definition seems to be gone. From what we now have, I can think
of several meanings, e.g.:

1) It's a digit in a system of decimal place notation, but doesn't quite
qualify for some reason. Typical examples are:

a) U+19DA NEW TAI LUE THAM DIGIT ONE - cruelly denied "decimal" status
because it wasn't assigned with 9 clones of the other Tai Lue digits.

b) U+2070 SUPERSCRIPT ZERO - not in a contiguous range, and
apparently possibility of misleading parsers

c) U+2080 SUBSCRIPT ZERO - apparent possibility of misleading parsers

2) It's in a decimal system, but not with place notation:

a) U+10E60 RUMI DIGIT ONE. By contrast U+10E69 RUMI NUMBER
TEN is a mere "numeric" - possibly because numeric field values of
blank for decimal digit value (field 6 in UnicodeData.txt), 1 as the
digit value (field 7) and 10 as the value (field 8) would be too
confusing, as well as contrary to the current rules.

On the other hand, I don't see why, apart from a general disapproval of
compatibility characters, the Roman numerals U+2170 SMALL ROMAN
NUMERAL ONE to U+2178 SMALL ROMAN NUMERAL NINE don't count as digits.

3) It's derived from a decimal digit, e.g. U+2468 CIRCLED DIGIT
NINE is "digit", whereas the next in the series, U+2469 CIRCLED NUMBER
TEN, just has a numeric type of "numeric".

----
It's not clear to me why the following decimal digits (in the normal,
not the Unicode sense) are not classified as "digit" but just as numeric
U+1D360 COUNTING ROD UNIT DIGIT ONE 
U+3021 HANGZHOU NUMERAL ONE
The only reason for U+1D369 COUNTING ROD TENS DIGIT ONE not to be a
digit that I can think of is that the system is conceived of as a
centesimal system. The counting rods 'UNIT' and 'TENS' digits are used
alternatively to avoid misreading, with various methods
for indicating zero.
Likewise, why are U+0C79 TELUGU FRACTION DIGIT ONE FOR ODD POWERS OF
FOUR and related characters not digits?  Is it because they are a base
4 (or collectively hexadecimal) system?
Perhaps some light can be shed on the system by learning what people
actually use the numeric types and (decimal) digit values for.
Richard.
Received on Wed Jul 11 2012 - 19:46:13 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 11 2012 - 19:46:23 CDT