Re: Numbered consonants in Tamil script abugida series

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed Jun 29 2005 - 14:18:57 CDT

  • Next message: Rick McGowan: "Unicode.org will go down soon"

    I'm resending this, as the version I sent over 11 hours ago has not yet made
    it to the archive.
    ----- Original Message -----
    From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
    To: "Unicode List" <unicode@unicode.org>
    Sent: Wednesday, June 29, 2005 8:57 AM
    Subject: Re: Numbered consonants in Tamil script abugida series

    I asked:
    <<<
    4. The subscript '2', '3' and '4' defy useful abstract analysis. They follow
    the connected glyph portion containing the consonant, preceding the glyph of
    VOWEL SIGN AA or
     AU LENGTH MARK. There seems to be no way to represent them in combination
    with those glyphs using Unicode! Can anyone see how (short of burying our
    heads in the sand) we can avoid adding at least combining marks TAMIL VARGA
    MARK TWO, TAMIL VARGA MARK THREE and TAMIL VARGA MARK FOUR? <vowel, varga
    mark> and <varga mark, vowel> will be canonically inequivalent.
    >>>

    and N Ganesan answered:

    > Can't we generate these subscripted abugidas on k, c, T, t, p using
    > subscripts/superscripts? For collation etc., may be we can get the varga
    > marks in the Tamil code chart itself. Then can you be able to do analysis?
    > For any usage samples, I'll be ready to help.

     I wasn't being ingenious enough. In part I was confused because Uniscribe
    can't render them in the above cases. However, பெ₄ௗ /bhau/, for example,
    can be entered as U+0BAA U+0BC6 U+2084 U+0BD7, so we can probably scrub the
    need for separate varga marks. Phew! Note that the number often comes
    immediately after a vowel rather than the consonant in visual as well as in
    code point order, eg. தி₃ /di/ U+0BA4 U+0BBF U+2083. How does one request
    Microsoft to support these subscripts and superscripts? Uniscribe inserts
    the dashed circle between a superscript or subscript and a (part) vowel
    mark.

     We don't need the marks in the table for collation. What we do need to
    know is how to sort them. Does consonant plus number sort as a separate
    consonant, or is it like an accent in French or Spanish? In these
    languages, accents are only taken into account when words differ only by the
    presence of the accent, so I am wondering if the same is true of the numbers
    in words in Tamil script. Or would Tamil sorting rules not be applied [to
    such words]?

     Richard.



    This archive was generated by hypermail 2.1.5 : Wed Jun 29 2005 - 14:19:58 CDT