Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

From: Christopher John Fynn (cfynn@gmx.net)
Date: Wed Oct 08 2003 - 08:41:41 CST


----- Original Message -----
From: "Peter Kirk" <peterkirk@qaya.org>
To: "Marco Cimarosti" <marco.cimarosti@essetre.it>
Cc: <unicode@unicode.org>
Sent: Wednesday, October 08, 2003 11:54 AM
Subject: Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

> On 08/10/2003 02:58, Marco Cimarosti wrote:

> >
> >What happens with the current Unicode scheme is that, if the font does
not
> >have a glyph for the ligature <cons><VIRAMA><cons>, nor for the half
> >consonant <cons><VIRAMA>, nor for the subjoined consonant
<VIRAMA><cons>,
> >the virama is *automatically* displayed visibly, so that the semantics
of
> >the text is always safe, even if rendered with the most stupid of fonts.
> >

Yes

> I don't understand the specific issues here... But it does seem a rather
> strange design principle that we should expect a text to be displayed
> meaningfully even when the font lacks the glyphs required for proper
> display. I would have thought it better not to attempt to display
> properly, perhaps display boxes as an indication of an error or trigger
> substitution by a font which does have the glyphs. After all, presumably
> those who write Bangla regularly will use a font which does have the
> necessary glyphs, and those who write it occasionally should be warned
> to find and change to such a font rather than misled into thinking
> things are OK.
>

Simplistically put, every Indic consonant usually has an inherent vowel
"A". When this vowel is not wanted the consonant is usually written as a
ligature joined (often in half form) with the following consonant. Another
way of removing the inherent vowel is to write a virama (halant) under it.
(Both forms are readable but the first is usually the preferable & expected
form.

In old handwritten orthography a large number of ligatures were used. With
metal type some typefaces lacked type (precomposed glyphs) for less
frequent combinations. This was worked around by printing consonant +
virama consonant in place of the ligature.

So a <consonant virama consonant> (where the virama is displayed below the
first consonant) is equivalent to a ligature of the two consonants (-
though writing the virama is usually not good typography.)

In Unicode virama (094D) is used between two consonants to indicate that
they should be displayed as a ligature. If the font does not have a glyph
for the ligature then a virama should be displayed under the first
consonant (to indicate the inherent vowel is killed).

If a ligature glyph for the two consonants is available and displayed then
the virama glyph is not displayed. So in effect the virama character
functions as a kind of ZWJ between two consonants but if, due to font
limitations*, a joined ligature cannot be displayed then the virama should
be displayed under the preceding character.

If a user wants to force a virama to be displayed (and prevent the ligature
form of the two consonants) then she can enter two virama characters after
the first consonant (and a glyph for *one* of these should be displayed
under the first consonant in the pair).

A virama typed after a consonant with no following consonant should always
be displayed.

It should be noted that there are combinations (ligatures) of several
consonants - to form these, a virama character would have to be entered
after each consonant character in the combination except the final one.

===

The model used for encoding Tibetan is different - two sets of consonants
were encoded. The first set (0F40 -0F6A) is used for isolated consonants
and for the first consonant in any combination; and the second set
(0F90-0FBC) explicitly combine with the preceding consonant. So the Tibetan
virama (0F84) is not needed as a joiner character and when it occurs
should always be displayed as a combining glyph. In the Tibetan encoding
isolated forms of vowels are also unnecessary.

- Chris

* e.g. in a pan-Unicode font like Arial Unicode or Code 2000 it would
probably not be practical to support all the ligatures for all the Indic
scripts. In such cases if the Virama is displayed the text is still
readable.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST