My colleague Simon Feather wrote:
> I have a problem using Unicode to display Indic scripts (Bengali,
> etc.). Can anyone please help explain what is happening, what the correct
> behaviour *should* be?
See "ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html": it has
all the answers to your questions, although in a very succinct way.
> a) are these "combining characters" or "graphemes"?
U+09A6 (BENGALI LETTER DA) is not a combining character, because its general
class is "Lo" (Other Letter). U+09BF (BENGALI VOWEL SIGN I) is a combining
character, because its general category is "Mc" (Spacing Combining Mark).
My meaning for "grapheme" is "any written symbol", so of course they both
are. I am possibly missing a special technical meaning of the term here?
> b) if they are combining characters, then according to my understanding of
> combination, we should be seeing different canonical values for these two
> characters, but the tables have them both with canonical order zero i.e.
> they are both starters - surely this cannot be right?
Whether a character is combining or not is determined by its general
category, not by its canonical combining class.
The canonical combining class "0" means several different things: "spacing,
split, enclosing, reordrant, and Tibetan subjoined". U+09A6 (BENGALI LETTER
DA) is a "spacing" character, while U+09BF (BENGALI VOWEL SIGN I) falls in
the "reordrant" case.
> c) should it matter which order they appear in the the UTF-8 data? If
> are combining characters, then again according to my understanding AB does
> not equal BA, which is I think what we're seeing here.
In UTF-8 (or in any other form of Unicode) the Bengali syllable "di" should
be spelled as:
and displayed in reverse order, as:
[glyph for U+09BF] [glyph for U+09A6]
This is why U+09BF (BENGALI VOWEL SIGN I) is called "reordrant".
> d) and if the order matters, then is it Unitype that has it wrong or the
> display mechanism?
I would say that Unitype Globalwriter behaves itself properly, while your
browser is the naughty guy.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT