Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Ananda (ananda@bdlink.com)
Date: Tue Oct 07 2003 - 21:29:12 CST


In our Bijoy Bangla Software, in fact we do not have 9 vowels in the code
and keyboard(o and oo is the exception as that two character does not have
vowel signs). When someone input the characters as per Bijoy keyboard they
enter Hasanta and the vowel sign and in Unicode that two characters are
saved. Gor Ki it is Ka+Ekar but for Koi, it is Ka+Hasanta+Ekar.
When someone enters conjuncts Hasanta is in the middle. If we talk about
reducing the codes, we can do that. We do not need two codes for vowel and
vowel signs. Unfortunately basis characters like Khanda ta, Antostho Ba,
Danrhi, Dui Danrhi has not been coded, whereas we got two codes for one
characters-say for vowels.
If we leave the shaping of characters to the rendering engine do we need
those 9 codes? In reality we are also confined to the input method. As all
the coded characters can not be accomodated in the keyboard we shall have
to use Hasanta as the link.
Mustafa Jabbar
------------- Original message follows -------------

On Tuesday 07 October 2003 12:21, Gautam Sengupta wrote:
> Is there any reason (apart from trying to be
> ISCII-conformant) why the Bangla word /ki/ "what"
> cannot be encoded as [KA][ZWJ][I]? Do we really need
> combining forms of vowels to encode Indian scripts?

I don't know what the original motivations were, but one thing about the
current (ISCII-based) encoding scheme that appeals to me is that on average
it requires fewer characters than other more natural schemes. Bangla has a
high percentage of 'vowel signs', each of which would require two
characters
in your scheme as opposed to one in the current one.

> Also, why not use [CONS][ZWJ][CONS] instead of
> [CONS][VIRAMA][CONS]? One could then use [VIRAMA] only
> where it is explicit/visible.

But this would not reflect the fact that the *glyph* [CONS][ZWJ][CONS] is
actually the same thing as the *sequence of characters* [CONS][VIRAMA]
[CONS],
i.e., [CONS][VIRAMA][ZWNJ][CONS] is also a perfectly legitimate
representation. This latter decision is one that should be taken (normally)
by the rendering mechanism (loosely speaking, the font), not the author.

Deepayan



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST