RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Thu Oct 09 2003 - 03:03:43 CST


Gautam Sengupta wrote:
> --- Marco Cimarosti wrote:
> > OK but, then, your <ZWJ> becomes exactly what
> > Unicode's <VIRAMA> has always
> > been: [...]
>
> You are absolutely right. I am suggesting that the
> language-specific viramas be retained as
> script-specific *explicit* viramas that never
> disappear. In addition, let's have a script-specific
> ZWJ which behaves in the way you describe in the
> preceding paragraph.

Good, good. We are making small steps forward.

What are you really asking for is that each Indic script has *two* viramas:

- a "soft virama", which is normally invisible and only displays visibly in
special cases (no ligatures for that cluster);

- a "hard virama" (or "explicit virama", as you correctly called it), which
always displays as such and never ligates with adjacent characters.

Let's assume that it would be handy to assign these two viramas to different
keys on the keyboard. Or, even better, let's assign the "soft virama" to the
plain key and the "hard virama" to the SHIFT key, OK? To avoid
misunderstandings with the term "virama", let's label this key "JOINER".

Now, this is what you *already* have in Unicode! On our hypothetic Bangla
keyboard:

- the "soft virama" (the plain JOINER key) is Unicode's <BENGALI SIGN
VIRAMA>;

- the "hard virama" (the SHIFT+JOINER key) is Unicode's <BENGALI SIGN
VIRAMA>+<ZWNJ>.

Not only Unicode allows all of the above, but it also has a third kind of
"virama", which may or may not be useful in Bangla but is certainly useful
in Devanagari and Gujarati:

- the "half-consonant virama" (let's assign it to the ALT+JOINER key in out
hypothetical keyboard) which forces the preceding consonant to be displayed
as an half consonant, if possible. This is Unicode's <BENGALI SIGN
VIRAMA>+<ZWJ>.

Notice that, once you have these three "viramas" on your keyboard, you don't
need to have keys for <ZWJ> and <ZWNJ>, as their only use, in Indic, is
after a <xxx SIGN VIRAMA>.

Apart the fact that two of the three viramas are encoded as a *pair* of code
points, how does the *current* Unicode model impede you to implement the
clean theoretical model that you have in mind?

> [...]
> > - independent and dependent vowels were the same
> > characters;
> [...]
>
> I agree with you on all of these issues. You have in
> fact summed up my critique of the ISCII/Unicode model.

OK. But are you sure that this critique should necessarily be moved to the
*encoding* model, rather than to some other part of the chain. I'll now try
to demonstrate how also the redundancy of dependent/independent vowels may
be solved at the *keyboard* level.

You are certainly aware that some national keyboards have the so-called
"dead keys". A dead key is a key which does not immediately send (a)
character(s) to the application but waits for a second key; in European
keyboards dead keys are used to type accented letters. E.g., let's see how
accented letters are typed on the Spanish keyboard (which, BTW, is by far
the best designed keyboard in Western Europe):

1. If you press the <´> key, nothing is sent to the application, but the
keystroke is memorized by the keyboard driver.

2. If you now press one of <a>, <e>, <i>, <o>, <u> or <y> keys, characters
<á>, <é>, <í>, <ó>, <ú> or <ý> are sent to the application.

3. If you press the space bar, character <´> itself is sent to the
application;

4. If you press any other key, e.g. <m>, the two characters <´> and <m> are
sent to the application in this order.

Now, in the description above substitute:

- the <´> key with <0985 BENGALI LETTER A> (but let's label it "VIRTUAL
CONSONANT");

- the <a> ... <y> keys with <09BE BENGALI VOWEL SIGN AA> ... <09CC BENGALI
VOWEL SIGN AU>;

- the <á> ... <ý> characters with <0986 BENGALI LETTER AA> ... <0994 BEGALI
LETTER AU>.

What you have is a Bangla keyboard where dependent vowels are typed with a
single <vowel> keystroke, and independent vowels are typed with the sequence
<VIRTUAL CONSONANT>+<vowel>.

Do you prefer your <cons>+<VIRAMA>+<vowel> model? Personally, I find it is
suboptimal, as it requires, on average, more keystrokes. However, if that's
what you want, in the Spanish keyboard description above substitute:

- the <´> key with the unshifted <JOINER> (= virama) key that we have
already defined above;

- the <a> ... <y> keys with <0986 BENGALI LETTER AA> ... <0994 BEGALI LETTER
AU>;

- the <á> ... <ý> characters with <09BE BENGALI VOWEL SIGN AA> ... <09CC
BENGALI VOWEL SIGN AU>.

Now you have a Bangla keyboard where independent vowels are typed with a
single keystroke, and dependent vowels are typed with the sequence
<JOINER>+<vowel>.

_ Marco



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST