Re: Numbered consonants in Tamil script abugida series

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Wed Jun 29 2005 - 14:18:57 CDT

Next message: Rick McGowan: "Unicode.org will go down soon"

Previous message: Eric Muller: "Re: Measuring a writing system "economy"/"accuracy""
Maybe in reply to: N. Ganesan: "Numbered consonants in Tamil script abugida series"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I'm resending this, as the version I sent over 11 hours ago has not yet made
it to the archive.
----- Original Message -----
From: "Richard Wordingham" <richard.wordingham@ntlworld.com>
To: "Unicode List" <unicode@unicode.org>
Sent: Wednesday, June 29, 2005 8:57 AM
Subject: Re: Numbered consonants in Tamil script abugida series

I asked:
<<<
4. The subscript '2', '3' and '4' defy useful abstract analysis. They follow
the connected glyph portion containing the consonant, preceding the glyph of
VOWEL SIGN AA or
AU LENGTH MARK. There seems to be no way to represent them in combination
with those glyphs using Unicode! Can anyone see how (short of burying our
heads in the sand) we can avoid adding at least combining marks TAMIL VARGA
MARK TWO, TAMIL VARGA MARK THREE and TAMIL VARGA MARK FOUR? <vowel, varga
mark> and <varga mark, vowel> will be canonically inequivalent.
>>>

and N Ganesan answered:

> Can't we generate these subscripted abugidas on k, c, T, t, p using
> subscripts/superscripts? For collation etc., may be we can get the varga
> marks in the Tamil code chart itself. Then can you be able to do analysis?
> For any usage samples, I'll be ready to help.

I wasn't being ingenious enough. In part I was confused because Uniscribe
can't render them in the above cases. However, பெ₄ௗ /bhau/, for example,
can be entered as U+0BAA U+0BC6 U+2084 U+0BD7, so we can probably scrub the
need for separate varga marks. Phew! Note that the number often comes
immediately after a vowel rather than the consonant in visual as well as in
code point order, eg. தி₃ /di/ U+0BA4 U+0BBF U+2083. How does one request
Microsoft to support these subscripts and superscripts? Uniscribe inserts
the dashed circle between a superscript or subscript and a (part) vowel
mark.

We don't need the marks in the table for collation. What we do need to
know is how to sort them. Does consonant plus number sort as a separate
consonant, or is it like an accent in French or Spanish? In these
languages, accents are only taken into account when words differ only by the
presence of the accent, so I am wondering if the same is true of the numbers
in words in Tamil script. Or would Tamil sorting rules not be applied [to
such words]?

Richard.

Next message: Rick McGowan: "Unicode.org will go down soon"
Previous message: Eric Muller: "Re: Measuring a writing system "economy"/"accuracy""
Maybe in reply to: N. Ganesan: "Numbered consonants in Tamil script abugida series"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jun 29 2005 - 14:19:58 CDT