Re: Tamil glyphs

From: Antoine Leca (Antoine.Leca@renault.fr)
Date: Wed Sep 13 2000 - 10:38:20 EDT


Marco Cimarosti wrote:
>
> Antoine Leca wrote:
> > I am not sure this is the only way to interpret the use of ZWNJ here.
> > Another way would be to consider the sequence ka+halant to be
> > a separate syllable, and then ka+i to be a second syllable. Then,
> > the correct rendering would be
> > ka_nominal, virama, i_matra, ka_nominal
>
> No, I think this would not be a correct implementation.
>
> If I remember correctly, this behaviour is described Devanagari block
> chapter on the Unicode book.
>
> ZWJ and ZWNJ should have this special meaning in conjunction with
> Devanagari's virama, and any compliant renderer should implement it:

We all agree. The problem is the interaction of this "explicit virama"
(described page 214), with the reordering of the i vowel sign (described
page 220). The linking between them is not at all clear, I think.
And as a result, implementations vary.

 
> 1) <consonant + virama + ZWJ> should render the "half consonant" glyph, if
> available, regardless of the context.

Yes, and irrelevant on this matter (but I shall return on it later).
 
> 2) <consonant + virama + ZWNJ> should render the "nominal" glyph with a
> visible combining virama, regardless of the context.

Paragraph 6 page 214, titled "explicit virama", says: "[...] placing the
character U+200C zero width non joiner immediately after the encoded
dead consonant that is to be excluded from conjunct formation."

So that could be read a bit more heavier than just a rendering glyph issue,
but rather affects the process of using conjuncts.

Also, R15 page 200, which explains the behaviour of the reordering short i,
states that "When the dependnat vowel Ivs is used [...], it is placed to the
extreme letf of the orthographic syllable."

Until that point, it seems that your position is the correct one (and the
implementations in the field are bugged).

R15 continues with "If the syllable contains a consonant cluster, then this
vowel is always depicted to the left of that cluster." But, as written above,
when ZWNJ is used, then the dead consonant is not any more part of the cluster!

Can of worms, can of worms...

 
> Apart these special display requirements, both sequences should be
> considered as an ordinary "dead consonant" (<consonant + virama>) and, if
> they precede a another consonant, should regularly formate a "consonant
> cluster". And, consequently, the i vowel sign should reorder around the
> whole cluster.

This is well understood (and the latter point is specific to Devanagari and
related scripts; Tamil as well as Oriya behaves differently here).

And on the other hand, ZWJ should otherwise retains its normal behaviour, which
is described page 215 as (sorry, I quote from memory) preventing use of specific
ligature or cluster when available. So, here, I agree with Mr McGowan and
disagree with Mr Kaplan, and it appears that when applied to Tamil, the
sequence

      XA + ZWJ + AIvs

(where XA is any consonant of NNA, NNNA, LA and LLA) _could_ be interpreted as
preventing the use of the elephant-trunk form of ai, without breaking previous
rules.

 
> I don't know how these rules extend to other Indic scipts. I think that #2
> is general, while #1 only makes sense for other scripts having "half
> consonants" (e.g. Gujarati).

Even Tamil, with its alone ligature KSSA, is affected: what is your
expectation of the rendering of Tamil KA + VIRAMA + ZWJ + SSA ?
I expect it to prevent the ligature, so to appear as
KA_with_pulli SSA, exactely the same as KA + VIRAMA + ZWNJ + SSA.

 
> U+0BA9, U+200C, U+0BC8 (nnna, ZWNJ, ai matra)
>
> would regularly reorder as:
>
> 0BC8, 0BA9, 200C (ai matra, nnna, ZWNJ)
>
> producing the following sequence of glyphs
>
> ai_matra, nnna
>
> The ZWNJ would simply be there to prevent the normal single-glyph sequence:
>
> nnna_ai_matra_ligature
>
> Notice that, unlike the rules for <virama + ZW[N]J> above, this is just my
> idea, not part of the standard.

I agree with your idea, but using ZWJ instead.

We agree this is an area where we really need some light, and a firmer guide
of implementation from the Unicode consortium. What is the way to request
a more strong rule of interpretation?

Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT