Re: Questions about proposed characters

From: Michael Everson (
Date: Sun May 30 1999 - 12:56:21 EDT

The kind of ligature you are requesting, Adam, would be very bad for your
language in the long run, because you'd never know whether something was
spelled <ch> and when <c><h>. Unicode doesn't choose to code these things
for that reason. Many of the digraphs already there are either from legacy
character sets (such as a lot of Arabic presentation forms as far as I
know) or are for specific odd practices, like those Croatian digraphs which
are there only to give one to one transliteration to Serbian.

Sometimes there may be a real advantage in processing of a ligature is
encoded, even when it is canonically equivalent to a string of other
characters. Mark Shoulson and I believe this is true about the HEBREW
TETRAGRAMMATON. Apparently some Hangul processing algorithms work better
with precomposed syllables (though historical syllables have to be
processed using a different model).

Welsh and Spanish and Irish and English all use the digraph <c><h> to
represent a single sound just as Slovak does. Welsh and Spanish sort them
as separate letters too. But it would be bad to encode <ch>.

Michael Everson, Everson Gunn Teoranta **
15 Port Chaeimhghein Íochtarach; Baile Átha Cliath 2; Éire/Ireland
Guthán: +353 1 478-2597 ** Facsa: +353 1 478-2597 (by arrangement)
27 Páirc an Fhéithlinn;  Baile an Bhóthair;  Co. Átha Cliath; Éire

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:46 EDT