Re: Merging combining classes, was: New contribution N2676

From: Philippe Verdy (
Date: Mon Oct 27 2003 - 09:28:00 CST

From: "Peter Kirk" <>

> So the logical order is
> <shin, sin/shin dot, dagesh, vowel, meteg>.
> But the canonical order is
> <shin, vowel, dagesh, meteg, sin/shin dot>;
> up to three (and in theory
> more, at least in biblical Hebrew) other characters may appear between
> the base letter and the dot which fundamentally modifies it.

Ohh, I forgot the case of the dagesh consonnant modifier.

But why would you like to encode the meteg before the vowel that it
modifies? Couldn't it be encoded locally as well after that vowel like:
1) The consonnant group: <shin or other base consonnant>, <sin/shin dot>,
2) The first vowel group: <vowel>, <meteg or other accents>.

From a Hebrew reader perspective, this logical order makes sense, as it
consistently groups the letters in the order they are effectively modified:
- One reads first the <shin or other base consonnant>
- Then alters it into a sin letter with <shin or other base
consonnant>,<sin-shin dot>
- Then uses the alternate phonetic by adding the <dagesh>
- Then recognizes the first "base" vowel sign
- Then alters it according to the added accents

You agree with me that using "combining order overrides" must be restricted,
so that it won't be abused. The idea of using CGJ to encode them may be
counterproductive, but one can simply avoid such abuse by creative such CCO
control within each script, here in the Hebrew block, by naming it simply a
HEBREW VOWEL GROUP HOLDER, which would have properties similar to other
hebrew base consonnants.

The same thing may be added in one of the Arabic blocks, and possibly in
other scripts like Tibetan, where similar issues may appear, or in extinct
rare scripts as an "implied" missing base letter, that would help fixing the
combining order.

This principle may help solve the ambiguities in all those affected scripts
(may be there are similar issues in the Latin script for Vietnamese, which
would like to better fit the phonetics of words that may be incorrectly
rendered by the currently requited normalization order of multiple accents.
Such issue also exists when there's a need to change the visual stacking
order of accents on Latin letters (for example if a macron should appear
below or above a dieresis). In this case, the CCO control added to the
general (Latin/Greek/Cyrillic) script would more likely be named something

And why not in Japanese too, if diacritics need to be added on top of
Hiragana/Katana letters with voice marks.

I see the general idea of CCO control characters as a general problem rather
than something specific to each language (like Biblic Hebrew), and I see no
reason why it could not be admitted and generalized with its own character
property category.

This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:24 CST