RE: Tamil Text Messaging in Mobile Phones

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Jul 30 2002 - 09:58:37 EDT


[Sorry for coming so late in this lengthy discussion, and sorry if I am
repeating or misunderstanding something. I hope that at least I am in-topic,
this time.]

Michael (Michka) Kaplan wrote:
> The end desired result is text that looks identical to what is in
> Unicode now -- but the backend store or "logical" ordering will have
> to be changed.

Perhaps I am missing something, but I can't help thinking that Michka and
others (including Doug and perhaps Asmus) are misunderstanding a key point
in Sinnaturai Srivas's proposal.

This "Linear Tamil", if I understand it, is exactly the opposite of what
Michka said above:

        << The end desired result is text that looks DIFFERENT to what TAMIL
PRESENTATION is now -- but the Unicode backend store or "logical" ordering
will NOT have to be changed. >>

I tried to imagine what should be changed in Unicode to accommodate this
"reformed Tamil". This is my guess:

        1. Code point assignments will not change, of course.

        2. "Logical" order will not change: it will only become visible in
the rendered text.

        3. Character properties will NOT change. This is perhaps the most
counter-intuitive point, so I'll explain it below.

It comes natural to think that reordrant character should have a different
Combining Class from other characters, but this assumption is wrong!
Reordrant Indic characters are in combining class ZERO, the default class!
Nothing in the Standard MANDATES that they must be reordrant!

See
<http://www.unicode.org/Public/UNIDATA/UnicodeData.html#Canonical%20Combinin
g%20Classes>:

        [...]
        Canonical Combining Classes
        Value Description
        0: Spacing, split, enclosing, reordrant, and Tibetan subjoined
        [...]

See also <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>:

        [...]
        0BC6;TAMIL VOWEL SIGN E;Mc;0;L;;;;;N;;;;;
        0BC7;TAMIL VOWEL SIGN EE;Mc;0;L;;;;;N;;;;;
        0BC8;TAMIL VOWEL SIGN AI;Mc;0;L;;;;;N;;;;;
        0BCA;TAMIL VOWEL SIGN O;Mc;0;L;0BC6 0BBE;;;;N;;;;;
        0BCB;TAMIL VOWEL SIGN OO;Mc;0;L;0BC7 0BBE;;;;N;;;;;
        0BCC;TAMIL VOWEL SIGN AU;Mc;0;L;0BC6 0BD7;;;;N;;;;;
        [...]

The only places where the Standard talks about reordrant and enclosing vowel
signs is in the names list, whose contents are summarized in
<http://www.unicode.org/Public/UNIDATA/NamesList.txt>:

        [...]
        0BC6 TAMIL VOWEL SIGN E
        * stands to the left of the consonant
        0BC7 TAMIL VOWEL SIGN EE
        * stands to the left of the consonant
        0BC8 TAMIL VOWEL SIGN AI
        * stands to the left of the consonant
        0BCA TAMIL VOWEL SIGN O
        * pieces on both sides of the consonant
        : 0BC6 0BBE
        0BCB TAMIL VOWEL SIGN OO
        * pieces on both sides of the consonant
        : 0BC7 0BBE
        0BCC TAMIL VOWEL SIGN AU
        * pieces on both sides of the consonant
        : 0BC6 0BD7
        [...]

But, according to <http://www.unicode.org/Public/UNIDATA/NamesList.html>,
the text following asterisks is just comments:

        [...]
        COMMMENT_LINE: <tab> "*" SP EXPAND_LINE
                              // * is replaced by BULLET, output line as comment
                              <tab> EXPAND_LINE
                              // output line as comment
        [...]

Therefore, by the point of view of Unicode, "Linear Tamil" is just an
alternate presentation of the SAME backing store, encoded in the SAME
Unicode. In the worst of cases, Unicode may need to add a single word in a
few comments: "stands to the left of the consonant, TRADITIONALLY".

Now, I don't know if this reform is good or bad. My impression is that it is
totally useless, and that it can be avoided with a minimal technical
effort... However, only Tamil people are entitled to decide and, whatever
their decision will be, the important point for Unicode is that it is NOT
going to character properties or any other aspect of the Standard.

Of course, it can become a problem for smart font technologies which choose
to support both Traditional and Reformed Tamil -- but this is not the
OpenType mailing list, right?

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 30 2002 - 07:56:39 EDT