RE: Encoding Bengali Vowel forms (again)

From: Marco.Cimarosti@icl.com
Date: Thu Apr 27 2000 - 08:05:07 EDT


As usual, I cannot stop spitting my little word :-|

As I am not an expert on Indic scripts, and Bengali in particular, please
everybody feel free to comment what I am saying, or even to destroy entirely
my opinions; I do appreciate learning things.

Abdul Malik wrote in his report:
> The problem
> Unicode allows conjunct part glyphs such as zophola to be
> formed only by placing the Virama sign ( >) between two
> consonants. When ‘zophola AA_sign’ is placed after Letter_E
> or Letter_AA it is not considered to form conjunct with the
> vowel, it only serves to function as a vowel modifier.
> The zophola-AA sign can not be included in Unicode as a vowel
> modifier sign however, as when placed after a consonant it is
> considered to have different semantics. It would also be
> illegal to place it after a vowel sign.

As I see it, this is a false problem, arising from some wrong assumptions.
Here they are, carefully and precisely described by Abdul Malik himself:

        1) "Unicode allows conjunct part glyphs [...] to be formed only by
placing the Virama sign between two consonants."

        2) "When ‘zophola AA_sign’ is placed after Letter_E or Letter_AA it
is not considered to form conjunct with the vowel, it only serves to
function as a vowel modifier."

        3) "It would also be illegal to place it [a virama] after a vowel
sign."

Assumption #2 is irrelevant: the precise grammatical or phonetic function of
characters is not an issue for encoding.

Assumptions #1 and #3 are totally false.

It is true that virama, *normally*, *follows* a consonant, but this is not
an obligation.

And it is not true that it must necessarily *precede* another consonant. In
Sanskrit, Tamil and other languages, it is perfectly normal for a word to
end with a virama!

In general, viramas are just characters as any other, and can occur
*everywhere*. And this a general feature of Unicode: with few reasonable
exceptions (e.g. unpaired surrogates), Unicode does not have a "syntax" that
stipulates which sequences of characters are legal and which are not.

The common idea that virama is a sort of "control character" to obtain
"conjunct glyphs" is a misunderstanding, IMHO. Viramas are *graphic*
characters that have their own shape (a sort of comma in Devanagari and
Bengali, a dot in Tamil, etc.), although *sometimes* they get molten in
ligatures.

> Conclusion
> ‘Vowel A_zophola_AA’ and ‘Vowel E_zophola_AA’ need to be
> included in the Bengali Unicode range as separate vowels.
> [...]

I have no opinions about accepting or not this proposal.

What I think, however, is that it is wrong to say that such a change is
*needed* for encoding Bengali. It would be a plus; maybe a nice plus, I
don't know, but not a necessity.

As I see it, zophola is just the special glyph used to represent the
sequence of these two characters:

        09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA)

The formation of this ligature can and should be totally *unconditional*: I
see no valid reason to bother checking for special conditions.

(OK: this sequence often follows a consonant, but why should "often" become
"always"?)

This unconditional legation process can also be used for many other
virama+consonant sequences in many Indic script: look at Appendix B in
http://www.microsoft.com/typography/otspec/indicot/appen.htm -- all forms of
type B and P in that table are cases similar to this.

This means that:

- zophola (in *any* position) can be encoded as:
  09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA)

And, consequently:

- zophola_aa can be encoded as:
  09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE (B. VOWEL SIGN AA)

- vowel_a_zophola_aa can be encoded as:
  0985 (B. LETTER A) + 09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE
(B. VOWEL SIGN AA)

- vowel_e_zophola_aa can be encoded as:
  098F (B. LETTER E) + 09CD (B. SIGN VIRAMA) + 09AF (B. LETTER YA) + 09BE
(B. VOWEL SIGN AA)

> The problem with [this] is that the string would have
> to be specifically looked for. [...]

Problem? Why a problem? The main job of a rendering engine is to look up the
glyphs that correspond to strings of one or more characters. Why should
*this* particular lookup be a problem?

Remember that I am proposing an *unconditional* ligature (09CD + 09AF is
always zophola, full stop).

This makes it much easier than other cases, where a condition must be
checked. E.g., consider:

        0932 (DEVANAGARI LETTER LA) + 094D (DEVANAGARI SIGN VIRAMA)

This may correspond to the "half consonant" glyph (*if* another consonant
follows), or to a the "full" glyph combined with a visible virama (*if* at
the end of a word), or to a special ad-hoc glyph (*if* a particular
consonant follows, e.g. another la), or...

... Bengali zophola is much simpler than this, ain't it?

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT