RE: Errors in the Indic FAQ

From: Marco Cimarosti ([email protected])
Date: Mon Nov 18 2002 - 10:55:49 EST

Next message: Frank da Cruz: "RE: The result of the plane 14 tag characters review."

Previous message: Jim Allan: "Re: Designing Vietnamese diacritics"
Maybe in reply to: Andy White: "Errors in the Indic FAQ"
Next in thread: Andy White: "RE: Errors in the Indic FAQ"
Reply: Andy White: "RE: Errors in the Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andy White wrote:
> A graphical version of this message available here:
> http://www.exnet.btinternet.co.uk/KhandaWeb/khanda.htm
>
> It is proposed by the Indic Unicode FAQ that Bengali
> Khanda_Ta should be encoded as Ta Virama ZWJ ... and that an
> explicit Ta_Virama can be encoded as Ta Virama ZWNJ. This
> information is wrong and must be changed.

I guess the FAQ is <http://www.unicode.org/unicode/faq/indic.html#19>,
right?

That FAQ is indeed wrong! (And I feel guilty for it: it was inspired by the
preceding FAQ which I submitted about Devanagari and, probably, I was also
asked to double-check it...)

However, IMHO, the only fix needed is deleting this sentence:

"If the sequence U+09A4, U+09CD is not followed by another consonant
letter (such as " ta") it is always displayed as a full ta glyph combined
with the virama glyph."

> First some background facts for the unacquainted.
> Khanda Ta is equivalent to Ta Virama i.e. it is a halant form of Ta.
> Khanda Ta is respected as a separate letter to Ta by Bengalis.
>
> It is incorrect and nonsensical to place a vowel sign
> immediately next to a Virama
>
> e.g. the sequence Ta Virama VowelSign.i is wrong. (This
> sequence implies the rendering, VowelSign.i Ta Virama
> (VowelSign.i is reordered). This is illogical).
> Therefore, it follows that it is also nonsensical to place a
> vowel sign immediately after a Khanda Ta (Khanda Ta is
> equivalent to Ta + Virama.)

This is all true. But where does the FAQ suggest a sequence like <Ta Virama
VowelSign.i>?

> In the Hindi script, you may write the sequence Ka Virama Ta
> VowelSign.i, and it may be rendered as VowelSign.i followed
> by a fully legated conjunct. However if you do not want this
> fully legated form you may use the sequence Ka Virama ZWJ Ta
> VowelSign.i and have it rendered as VowelSign.i Half_Ka Ta
>
> Now turning to the Bengali example of Ta Virama Ta VowelSign.i
>
> Ta Virama Ta VowelSign.i may be rendered as: VowelSign.i
> Ta_Ta.fullylegated:
> And going by the FAQ:
> Ta Virama ZWJ Ta VowelSign.i. would be rendered as
> VowelSign.i._KhandaTa Ta
> But this is clearly wrong, as Kanda Ta has now taken on a
> vowel sign, which is illegal.

This example would be wrong... But I don't see it in the FAQ.

> What was needed here was a ZWNJ to separate the Ta Virama
> from the proceeding Ta.
> But according to the FAQ Ta Virama ZWNJ Ta is to be rendered
> as: Ta_Virama.explicit, Ta (Ta with a visible Virama, Ta).
> Which seems to imply that Ta Virama ZWNJ VowelSign.i would be
> rendered as: Ta_Virama.explicit,VowelSign.i Ta:

I don't think the FAQ implies this.

In some Indic scripts (e.g., Devanagari), left-side matras reorder around
the whole consonant cluster; in some other scripts (e.g., Tamil, Malayalam),
they reorder around the base consonant only:

Devanagari: Ta Virama ZWNJ Ta MatraI -> MatraI Ta+Virama Ta

Tamil: Ta Virama (ZWNJ) Ta MatraI -> Ta+Virama MatraI Ta

(Notice that ZWNJ is redundant in Tamil, as the rendering would be identical
without it.)

My assumption is that Bengali, in this respect, behaves with Tamil and
Malayalam.

But this is something which is absolutely not clear from the Unicode Book:
my assumption above is based on discussions on this list, and about
non-Unicode sources such as
<http://www.microsoft.com/typography/otfntdev/indicot/default.htm>.

I think that a FAQ should be provided *by* Unicode about this. Even better,
this should be dealt with in detail in the next edition of the TUS. IMHO,
this is not a typographical detail that can be left to implementers to
settle: it affects the interpretation of text.

> I hope that it is clear from this example that the behaviour
> of Ta Virama in conjunction with ZWJ & ZWNJ needs to be changed.

Why? The purpose of ZWJ and ZWNJ us one of the few things in Indic Unicode
which is quite clear.

A sequence of consonant+Virama+ZWJ always shows a half form glyph (such as a
the Half-Ta in Devanagari or the Khanda Ta in Bengali).

OTOH, consonant+Virama+ZNWJ always shows a visible virama attached to a full
form.

The difference between Devanagari and Bengali is only when *no* ZWJ or ZWNJ
are present at the end of a word: Bengali behaves as if a ZWJ followed the
virama, while Devanagari behaves as if a ZW*N*J followed the virama.

> Further more, ZWJ should be used to form half consonants in
> Indic scripts, but it can be seen that Khanda_Ta is not a
> half form as it is regularly used as the last letter of a
> word (half forms never are).

What's wrong in saying that it is a half form?

> The behaviour should be as follows:
>
> Ta Virama ZWNJ ... should lead to KandaTa (i.e the halant form of Ta)

This would be against the normal Indic meaning of ZWNJ, which is: show the
virama.

> e.g. The Bengali word kutsit shall be encoded as:
> Ka VowelSign.u Ta Virama ZWNJ Sa VowelSign.i Ta
> and rendered as:
> Ka VowelSign.u Ta VowelSign.i Sa Ta.
> (ZWNJ marks the separating point hence preventing the
> VowelSign.i. connecting to Ta)

IMHO, that's not needed, because Tamil, Malayalam and Bengali left-side
matras should always reorder only around the last glyph of a consonant
conjunct.

> Ta Virama ZWJ ... should lead to a half form of Ta which I
> suggest should be Ta with a visible Virama (there is no half
> form of Ta in Bengali)

There is one: Khanda-Ta.

> To conclude, I recommend that in general:
>
> Ta Virama ... -> KhandaTa (i.e. Halant form of Ta) if a
> following letter does not naturally legate with it,
> else, Ta Virama ... -> conjunct form

OK.

> Ta Virama ZWNJ -> KandaTa (i.e. explicit Halant form)

Ta Virama ZWNJ -> Ta+Virama (as in all other Indic blocks).

> Ta Virama ZWJ -> Ta Virama (as Bengali dose not have a half
> form of this character).

Ta Virama ZWJ -> Khanda-Ta (ZWJ prevents a possible natural ligature with a
following consonant).

Regards.
Marco

Next message: Frank da Cruz: "RE: The result of the plane 14 tag characters review."
Previous message: Jim Allan: "Re: Designing Vietnamese diacritics"
Maybe in reply to: Andy White: "Errors in the Indic FAQ"
Next in thread: Andy White: "RE: Errors in the Indic FAQ"
Reply: Andy White: "RE: Errors in the Indic FAQ"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Nov 18 2002 - 11:44:27 EST