Re: Why wasn't it possible to encode a coeng-like joiner for Tibetan? from Richard Wordingham on 2013-04-10 (Unicode Mail List Archive)

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 11 Apr 2013 01:54:43 +0100

On Thu, 11 Apr 2013 04:39:45 +0530
Shriramana Sharma <samjnaa_at_gmail.com> wrote:

> Or was the Khmer model of an invisible joiner a *later* bright idea?
> But really that doesn't hold water (I mean the "later" part) because
> the Indic virama model already existed, and whether or not Tibetan
> used the visible virama heavily need not have prevented from a virama
> character, which would have a visible form in appropriate contexts,
> causing stacking in other contexts.

Vowel cancelling and consonant joining are really two different
actions, and in Tai Tham you can see some weird combinations. For
example, the word _cuewaa_ (phonetically CVCV) is spelt in arguably one
akshara as <LOW CA, UUE, TONE-1, SAKOT, WA, AA>. (The argument is over
whether the AA should be considered a base consonant.) It would be
nonsensical to call SAKOT a vowel canceller - the explicit vowels UUE
and AA are present in every sense. Now, Tai Tham does have a vowel
canceller, but because Tai languages have at most a CRVVC structure,
the vowel canceller usualy has the important side effect of silencing a
consonant. Now in Thai, this canceller can cancel either consonant in
a European syllable ending in multiple consonants, and I can see this
spreading to the uses of Tai Tham. (For Indic words, the Thai
canceller is only allowed where there is a vowel, either explicit or
implicit. In particular, if there is a final cluster of two
consonants, and it is the first that is silent, it is not marked as
silent.) In particular, I can see a potential contrast between <WA, RA
HAAM, SAKOT, SA> and <WA, SAKOT, SA, RA HAAM>. The contrast can be
made because the stack is not a pure stack, but allows the tail of
<SAKOT, SA> to rise to the top.

The coeng idea saves on code points. It would have been cleaner to
have separate set of subjoined consonants for Tai Tham, rather than a
mixture of consonant signs and coeng + consonant.

> As for the RA-MGO thing, I still am not sure how it is advisable to
> have a 0F6A glyphically identical to 0F62 and even if a
> default-ignorable ZWNJ would not have been satisfactory, some
> specialized non-default-ignorable conjoining-form-prevention character
> could be defined, which would then also be used for subjoined
> full-form YA RA VA avoiding those extra characters too.

Such a special character might be used where it had no effect, so the
problem cannot be entirely eliminated. It's not far removed from the
visual identity of U+0069 LATIN SMALL LETTER I and U+0131 LATIN SMALL
LETTER DOTLESS I when they have a combining mark above, or the
identity in the Khmer script of <COENG, TA> and <COENG, DA>.

A possible downside of subjoined letters is that they form part of the
default grapheme cluster with the base letter, and some applications
therefore refuse permission to edit them independently. This
restriction is a major annoyance when there are 3 or 4 marks on a base
consonant.

Richard.
Received on Wed Apr 10 2013 - 19:59:15 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 10 2013 - 19:59:16 CDT