Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Gautam Sengupta (gsghyd@yahoo.com)
Date: Wed Oct 08 2003 - 10:27:32 CST


--- Christopher John Fynn <cfynn@gmx.net> wrote:
> "Gautam Sengupta" <gsghyd@yahoo.com> wrote:

> > Is there any reason (apart from trying to be
> > ISCII-conformant) why the Bangla word /ki/ "what"
> > cannot be encoded as [KA][ZWJ][I]? Do we really
> need
> > combining forms of vowels to encode Indian
> scripts?
>
> The encoding of most Indic scripts is based on ISCII
> - and that's not going
> to change. It was adopted since ISCII was the
> pre-existing Indian national
> character encoding standard for these scripts.

I understand that this is so. But perhaps it is
worthwhile for us to be aware of the flaws in ISCII
that were inherited by Unicode. It is also necessary
to recognize the fact that the bureaucrats in a
government are not necessarily the most competent
people to adjudicate on how a script should be
encoded. I wonder whether the Dept of Electronics,
Govt of India, would have any reasons to offer
justifying the placement of Assammese /r/ and /v/ and
the long syllabic /r/ and /l/ in their current
positions.

> Another model could have been followed. For example
> in Tibetan, isolated
> vowels are encoded as:
> 0F68
> 0F68 0F71
> 0F68 0F72
> 0F68 0F73 [0F68 0F71 0F72]
> 0F68 0F74
> 0F68 0F75 [0F68 0F71 0F74]
> 0F62 0F80
> 0F62 0F81 [0F62 0F71 0F80]
> 0F63 0F80
> 0F63 0F81 [0F63 0F71 0F80]
> 0F68 0F7A
> 0F68 0F7B
> 0F68 0F7C
> 0F68 0F7D
> 0F68 0F7E
> 0F68 0F7F

This would have been more appropriate, and possibly
more economical.
 
> > Also, why not use [CONS][ZWJ][CONS] instead of
> > [CONS][VIRAMA][CONS]? One could then use [VIRAMA]
> only
> > where it is explicit/visible.
>
> There was a third possibility: In the Tibetan
> encoding a second set of
> explicitly combining consonants was encoded
>
> So you have [CONS] [COMBINING CONS]
> Instead of [CONS][VIRAMA][CONS] or
> [CONS][ZWJ][CONS]
>
This would have been difficult for the Indian scripts.
There would be too many combining forms. We would need
many more code points.
>
> This was done because a) although a Virama character
> exists in Tibetan very
> few Tibetans know what it means since it is almost
> never written and never
> occurs in ordinary text. b) In many combinations it
> is totally unacceptable
>
The use of ligatures in Indian scripts is not as much
a matter of choice as it is often assumed to be. For
example the word /strii/ "woman" written as
<S><VIRAMA><T><VIRAMA><R><II> would be totally
unacceptable in Bangla and most other Indian scripts.

> In other words the ISCII model was not suitable for
> Tibetan so a different encoding model was adopted.
>
*In its current implementation/interpretation* it
doesn't seem to be very suitable for Indian scripts
either.
-Gautam

__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST