There is already a canonical order of combining marks. It is described
in sections 3.9 and 4.2 of the Unicode Standard, Version 2. Ordering
information is available on http://www.unicode.org.
> John Cowan schrieb:
> > Thus LATIN CAPITAL LETTER O plus COMBINING DOT BELOW plus
> > COMBINING CIRCUMFLEX BELOW plus COMBINING CIRCUMFLEX (to make
> > up an example) can be reduced to LATIN CAPITAL LETTER O WITH
> > CIRCUMFLEX AND DOT BELOW (U+1ED8) plus COMBINING CIRCUMFLEX BELOW,
> > but if DOT BELOW comes after CIRCUMFLEX BELOW, the shortest
> > is to LATIN CAPITAL LETTER O WITH CIRCUMFLEX plus COMBINING DOT
> > BELOW plus COMBINING CIRCUMFLEX BELOW.
> Hmm... I think first of all, a canonical order of all combining marks
> needed. The combining marks fall into three classes: strike-through,
> and above (maybe those wide combining marks form a forth class). Note
> you cannot reorder the combining marks within one class without
> the character:
> > but if DOT BELOW comes after CIRCUMFLEX BELOW
> In this case, the dot is displayed under the circumflex, where in the
> original case it was above.
> I am not sure which order of the classes is the best (strike-through >
> below > above or strike-through > above > below). A good analysis
> take the language of the text data into account (and anylyse e. g. for
> vietnamese A WITH CIRCUMFLEX as a base letter), but this is impossible
> a canonical algorithm which must work on untagged data.
> --J"org Knappen
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT