Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Gautam Sengupta (
Date: Tue Oct 07 2003 - 20:44:32 CST

> I don't know what the original motivations were, but
> one thing about the
> current (ISCII-based) encoding scheme that appeals
> to me is that on average
> it requires fewer characters than other more natural
> schemes. Bangla has a
> high percentage of 'vowel signs', each of which
> would require two characters
> in your scheme as opposed to one in the current one.

There is a trade-off here between file size and the
number of code points used. File size could be further
reduced, for example, if combining forms of consonants
were introduced. But that would be a step in the wrong
direction for various reasons that I will not discuss
here. I am not sure that the right thing to do is to
economize on file size rather than code points.
> > Also, why not use [CONS][ZWJ][CONS] instead of
> > [CONS][VIRAMA][CONS]? One could then use [VIRAMA]
> only
> > where it is explicit/visible.
> But this would not reflect the fact that the *glyph*
> [CONS][ZWJ][CONS] is
> actually the same thing as the *sequence of
> characters* [CONS][VIRAMA][CONS],

But, it is not, certainly not in writing; and that's
the whole point. [CONS][ZWJ][CONS] and
[CONS][(EXPLICIT)VIRAMA][CONS] are "identical" at a
level of linguistics abstraction that need not be
reflected in text encoding. Consider [C][L] and
[C][L][VIRAMA]. They represent the same words, they
are the "same" at some level of representation, but
that is irrelevant for the task at hand.

> This latter decision is one that should be taken
> (normally) by the rendering mechanism (loosely
> speaking, the font), not the author.

I disagree. If an author chooses to write a word with
an explicit virama, you have to respect that and let
it be reflected in the encoding. Leaving such
decisions to the rendering engine would destroy the
character and flavor of certain texts. Furthermore
there are metalinguistic uses of the explicit virama
that need to be kept distinct from forms with
conjoined characters.

Thanks Deepayan for your feedback. -Gautam

