RE: Merging combining classes, was: New contribution N2676

From: Kent Karlsson (kentk@cs.chalmers.se)
Date: Tue Oct 28 2003 - 06:49:59 CST


Philippe Verdy wrote:
> There's a counter example with the position of the circumflex on the
> lowercase t (I can't remember for which language it occurs,
> sorry), which is
> in some cases not the one that its combining class would
> normally take.

There are also the cases of comma below a small g (Lithuanian),
which is rendered turned above the g, and of ring below g (IPA)
that should be rendered above the g... Neither of these invalidate,
or puts to question, the combining classes of comma below (and
cedilla...) or ring below, as far as I can see.

So far, it has been noticed that some Hebrew and Arabic marks,
mostly the vowel marks, have inappropriate combining classes.
The solution suggested by the UTC is to use CGJ. But it also has
to be simple and practicable. Putting a CGJ after each occurrence
of the characters with badly assigned combining class effectively
gives them a combining class of 0. Perhaps not ideal, and indeed
a kludge. But simple and practical. A keyboard layout, for instance,
can just generate a CGJ after each troublesome Arabic and Hebrew
mark. With current keyboard layout specification mechanisms,
that's about the best that can be done on the keyboard side of it.

Removing superfluous CGJs should be done by a separate utility.
Trying to build that into normalisation is probably not such a good
idea.

Defining new characters to replace the troublesome ones, a more
elegant solution, has been rejected by the UTC. On compatibility
grounds, IIRC.

                /kent k





This archive was generated by hypermail 2.1.5 : Thu Jan 18 2007 - 15:54:25 CST