Non-decimal positional digits; was: Defined Private Use

From: Ernest Cline (ernestcline@mindspring.com)
Date: Wed Apr 28 2004 - 10:32:14 EDT

  • Next message: Peter Constable: "RE: Croatian"

    > [Original Message]
    > From: Doug Ewell <dewell@adelphia.net>
    >
    > Ernest Cline <ernestcline at mindspring dot com> wrote:
    >
    > > TENGWAR DUODECIMAL DIGITS TEN and ELEVEN
    > > present an interesting problem. They are digits, but not
    > > decimal digits. Should the concept of General Category
    > > Nd be expanded to include non-decimal number systems?
    >
    > No, the "d" stands for Decimal. This category is deliberately limited
    > to characters that can be concatenated to form numbers in a base-10
    > positional number system. It's a fact of life that base-12 and base-16
    > digits are relegated to category No.

    I recognized that limitation with my choice of words. The fact is
    that at present Unicode does not encode any non-decimal digits,
    (The hexdigits [a-f][A-F] don't count because they aren't used
    exclusively as digits, and don't have any numeric values (aside
    from the ever not-so helpful Hex_Digit and ASCII_Hex_Digit)
    assigned to them. Given that fact, it might make sense to provide
    a way for Unicode to indicate non-decimal positional digits.
    Extending Nd was one possibility I considered, another was to use
    the triple number.

    > > Or would
    > > E06A;TENGWAR DIGIT TEN;Nl;0;L;;10;10;10;N;;;;;
    > > be sufficient?
    >
    > I think the General Category has to be No rather than Nl. Very few
    > characters are of type Nl -- just the Roman numerals, "Hangzhou"
    > numerals and Ideographic Zero, and Runic and Gothic letter-numbers.
    > Tengwar duodecimal digits aren't letters that got pressed into service
    > as numbers, they're just digits that happen to be base-12.

    Given that the zero and one (at least in the Tengwar draft I looked at,
    there are several at present) were letters and that the example given
    by Runic and Gothic is that when some numbers are letters and the
    rest are not, the extra numbers get the Nl category instead of No,
    I would think that Nl would be appropriate for Tengwar. If Tengwar
    were altered to disunify zero and one from the same shaped letters,
    then No would indeed be appropriate

    > Also, of the three "10" values, you need to remove the first -- it's
    > only valid for characters with the decimal digit property (see
    > http://www.unicode.org/Public/UNIDATA/UCD.html for more details).

    As I said above, Unicode does not currently have any digits used
    with a non-decimal positional system. Extending the use of either
    category Nd, or of the triple number in UnicodeData.txt seemed
    to be appropriate ways to do so. Looking over the data files again,
    I see that there exist Nd's without the triple number that are not
    intended to be used in positional systems, so I think that if Unicode
    chooses to provide a mechanism to specify non-decimal positional
    digits, using the triple number is probably the best approach.



    This archive was generated by hypermail 2.1.5 : Wed Apr 28 2004 - 11:28:57 EDT