RE: Problems using Unicode to display Indic scripts?

From: Marco.Cimarosti@icl.com
Date: Thu Mar 02 2000 - 07:28:36 EST


My colleague Simon Feather wrote:

> I have a problem using Unicode to display Indic scripts (Bengali,
Gujerati,
> etc.). Can anyone please help explain what is happening, what the correct
> behaviour *should* be?

See "ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html": it has
all the answers to your questions, although in a very succinct way.

> a) are these "combining characters" or "graphemes"?

U+09A6 (BENGALI LETTER DA) is not a combining character, because its general
class is "Lo" (Other Letter). U+09BF (BENGALI VOWEL SIGN I) is a combining
character, because its general category is "Mc" (Spacing Combining Mark).

My meaning for "grapheme" is "any written symbol", so of course they both
are. I am possibly missing a special technical meaning of the term here?
 
> b) if they are combining characters, then according to my understanding of
> combination, we should be seeing different canonical values for these two
> characters, but the tables have them both with canonical order zero i.e.
> they are both starters - surely this cannot be right?

Whether a character is combining or not is determined by its general
category, not by its canonical combining class.

The canonical combining class "0" means several different things: "spacing,
split, enclosing, reordrant, and Tibetan subjoined". U+09A6 (BENGALI LETTER
DA) is a "spacing" character, while U+09BF (BENGALI VOWEL SIGN I) falls in
the "reordrant" case.

> c) should it matter which order they appear in the the UTF-8 data? If
they
> are combining characters, then again according to my understanding AB does
> not equal BA, which is I think what we're seeing here.

In UTF-8 (or in any other form of Unicode) the Bengali syllable "di" should
be spelled as:

        U+09A6 U+09BF

and displayed in reverse order, as:

        [glyph for U+09BF] [glyph for U+09A6]

This is why U+09BF (BENGALI VOWEL SIGN I) is called "reordrant".

> d) and if the order matters, then is it Unitype that has it wrong or the
> display mechanism?

I would say that Unitype Globalwriter behaves itself properly, while your
browser is the naughty guy.

Ciao.
        Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT