Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Gautam Sengupta (
Date: Wed Oct 08 2003 - 20:21:42 CST

--- Kenneth Whistler <> wrote:
> Gautam said:
> > > The encoding of most Indic scripts is based on
> > > - and that's not going
> > > to change. It was adopted since ISCII was the
> > > pre-existing Indian national
> > > character encoding standard for these scripts.
> >
> > I understand that this is so. But perhaps it is
> > worthwhile for us to be aware of the flaws in
> > that were inherited by Unicode. It is also
> necessary
> > to recognize the fact that the bureaucrats in a
> > government are not necessarily the most competent
> > people to adjudicate on how a script should be
> > encoded. I wonder whether the Dept of Electronics,
> > Govt of India, would have any reasons to offer
> > justifying the placement of Assammese /r/ and /v/
> and
> > the long syllabic /r/ and /l/ in their current
> > positions.
> Why should they? The positions of these characters
> in
> the Unicode code chart for the Bengali script has
> nothing
> to do with the ISCII chart, in any case. They are
> *additions* beyond the ISCII chart.

[Gautam]: Yes, they do. The arrangement is identical
in a new code space of the same size.

> In the case of
> the Assamese letters, these additions separate out
> the *distinct* forms for Assamese /r/ and /v/ from
> the Bangla forms, and *enable* correct sorting,
> rather
> than inhibiting it. The addition of the long
> syllabic
> /r/ and /l/ *enables* the representation of Sanskrit
> material in the Bengali script, and the code
> position in
> the charts is immaterial.

[Gautam]: Nobody is objecting to the addition of these
forms, only to their placement vis-s-vis the other
> By the way, the relevant organization now would be
> TDIL, within the Indian Ministry of Communications
> and
> Information Technology -- not the Dept. of
> Electronics.

[Gautam]: Yes, indeed. The ministry has been renamed.
TDIL remains the same.

> But be that as it may, they have nothing to do with
> the code point choices in the range U+09E0..U+09FF,
> as should be clear from the documentation of the
> Unicode Standard. See The Unicode Standard, Version
> 4.0, p. 219, available online.

[Gautam]: I did look up the document and this is what
I found:

"The Devanagari block of the Unicode Standard is based
on ISCII 1988. ...
The Unicode Standard encodes Devanagari characters in
the same relative positions as those coded in
positions A0-F4 in the ISCII 1988 standard. The same
character code layout is followed for eight other
Indic scripts in the Unicode Standard ... This
parallel code layout ... follows the stated intention
of the Indian coding standard to enable one-to-one
mappings between analogous coding positions in
different scripts in the family."

Clearly ISCII has a *lot to do* with code point
choices in the range U+09E0..U+09FF.

Best, Gautam.

Do you Yahoo!?
The New Yahoo! Shopping - with improved product search

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST