Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Oct 27 2003 - 14:48:20 CST


From: "Peter Constable" <petercon@microsoft.com>

> There is no problem requiring a solution for combining marks used with
> Latin script,* including IPA and Vietnamese, because all of the marks
> that occupy a comparable space relative to the base have the same
> combining class, meaning that normalization does not affect the order.
> For such combinations, there is no "required normalization order".
>
> So, for instance, the sequences < ..., combining macron, combining
> diaeresis, ... > and < ..., combining diaeresis, combining macron, ... >
> are both in canonical order and *not* canonically equivalent.

You're right with the existing diacritics for Latin-scripted texts in
existing Latin-based languages. This will be true as long as there will not
be diacritics that span several positions in the stack (for example a
diacritic that has left or right position, depending on the context).

There's a counter example with the position of the circumflex on the
lowercase t (I can't remember for which language it occurs, sorry), which is
in some cases not the one that its combining class would normally take.

In that case, the combining class of the diacritic will collide with the
other diacritic that uses that same place with its distinct combining class,
and the relative encoding order of these diacritics will be important to
specify the layout order in the visual stack.

Of course, as long as Latin diacritics will be rendered at the position
implied by their combining class, this will be OK. In summary: the normative
combining class of any combining character impacts the glyph representation,
and font renderers are not free to place the diacritic according to the
linguiistic needs, depending on the base character. And if it cannot move a
above-detached diacritic to the wanted above-right-attached position, there
will be a encoding conflict if there are also other diacritics (for example
an arrow-above).

For this reason, there may exist cases even in the Latin script, where
canonical equivalence may have a bad effect, if diacritics need to be
represented differently. Inserting a class-override control character in
that case may also help solve the problem so that this text can be rendered
and interpreted correctly.



This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST