RE: Eastern Arabic-Indic Digits & Marathi Allographs

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Oct 02 2006 - 17:41:18 CST

  • Next message: vunzndi@vfemail.net: "Re: CJK Extension C (was: Re: Unicode 5.0 success)"

    Jarkko Ahonen asked (last week):

    > Is Unicode going to have separate Unicode values for the Farsi (Persian)
    > and Urdu digits as they now have same values but with glyph variation
    > (digits 4, 6 and 7)?

    The answer on this has been documented for some time in the
    standard. See:

    http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf

    and look at Table 8-2, Glyph Variation in Eastern Arabic-Indic Digits.

    The variation in form for the digits 4, 6, and 7 between Persian,
    Sindhi, and Urdu is considered *glyph* variation for the
    range of Eastern Arabic-Indic digits. It is comparable, for
    example, to the kind of range of glyphs found for ASCII digits
    in different parts of the world.

    In fact, the main reason for distinguishing the range of Arabic
    digits U+0660..U+0669 from the range of Eastern Arabic-Indic
    digits U+06F0..U+06F9 in the standard at all is not the variation
    in glyph forms for 4, 5, 6, and 7, but rather the distinction
    in bidirectional character properties: bc=AN versus bc=EN, relevant
    to several rules in the Bidirectional Algorithm.

    > How about the Marathi allographs of LA (U+0932) and SHA (U+0936)?

    They are allographs, as documented -- hence treated as glyph
    variants of those code points. There is no intention of creating
    separate encoded characters for them.

    --Ken



    This archive was generated by hypermail 2.1.5 : Mon Oct 02 2006 - 17:44:46 CST