RE: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Gautam Sengupta (
Date: Wed Oct 08 2003 - 08:02:35 CST

--- Marco Cimarosti <>
> > Also, why not use [CONS][ZWJ][CONS] instead of
> > [CONS][VIRAMA][CONS]? One could then use [VIRAMA]
> > only where it is explicit/visible.
> OK. But what happens when the font does not have a
> glyph for the ligature <cons><ZWJ><cons>, nor for
> the half consonant <cons><ZWJ>, nor for the
> subjoined consonant <ZWJ><cons>?
> As <ZWJ>, per se, is an invisible character, what
> happens is that your
> string displays as <cons><cons>, which is clearly
> semantically incorrect. If
> you want the explicit virama to be visible, you need
> to encode it as
> <cons><VIRAMA><cons>.
> And this means that you (the author of the text) are
> forced to chose between
> <ZWJ> and <VIRAMA> based on the availability of
> glyphs in the *particular*
> font that you are using while typing. And this is a
> big no no no, because it
> would impede you to change the font without
> re-typing part of the text.
> What happens with the current Unicode scheme is
> that, if the font does not
> have a glyph for the ligature <cons><VIRAMA><cons>,
> nor for the half
> consonant <cons><VIRAMA>, nor for the subjoined
> consonant <VIRAMA><cons>,
> the virama is *automatically* displayed visibly, so
> that the semantics of
> the text is always safe, even if rendered with the
> most stupid of fonts.

I am no programmer, but surely the rendering engine
could be tweaked to display a halant/hashant in the
aforementioned situations? I understand that it won't
happen *automatically* if we were to use <ZWJ> instead
of <VIRAMA>. But if you were to take the trouble to do
the tweaking, you'd then have a completely *intuitive*
encodings for vowel yaphala sequences,
<vowel><ZWJ><Y>, instead of oddities like
> > Surely, [A/E][ZWJ][Y][ZWJ][AA] is more "natural"
> > and intuitively acceptable than any encoding in
> > which a vowel is followed by a [VIRAMA]?
> Maybe. But I see no reason why being natural or
> intuitive should be seen as
> key feature for an encoding system. That might be
> the case for an encoding
> system designed to be used by humans, but Unicode is
> designed to be used by
> computers, so I don't see the problem.

Perhaps there isn't a *problem* as such, and perhaps
naturalness and intuitive acceptability aren't *key*
features of the system, but surely other factors being
equal they ought be taken into consideration in
choosing one method of encoding over another?
> I assume that in a well designed Bengali input
> method, yaphala would be a
> key on its own,
> so, by the point of view of the user, it is just a
> "character": they don't need to know that when they
> press that key the
> sequence of codes <VIRAMA><YA> will actually be
> inserted, so they won't
> notice the apparent nonsense of the sequence
> <vowel><VIRAMA> and, as we say
> in Italy, "If eye doesn't see, heart doesn't hurt".

No, YAPHALA won't be a character on its own, only Y
will be. The -PHALA in YAPHALA indicates that it is a
combining variant of a grapheme. YAPAHALA will be a
combining variant of Y to be inserted by the rendering
engine in the appropriate environment. The user will
*see* and key in the <ZWJ> between a consonant and a
<Y> (or a vowel and <Y>) in order make the latter show
up as a yaphala.

Marco, thank you *very* much for your extremely
helpful comments and feedback. Best, Gautam.

Do you Yahoo!?
The New Yahoo! Shopping - with improved product search

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST