From: Kenneth Whistler (
Date: Mon Feb 10 2003 - 22:40:12 EST

  • Next message: John Cowan: "Re: LATIN LETTER N WITH DIAERESIS?"

    António MARTINS-Tuválkin (with no diaeresis !) asked:

    > Anyway, I noted once more that many cyrillic letters I'd consider as
    > "base letter + diacritical" composites are not decomposable according to
    > Unicode. I planned to dwell deeper into this, but is there a short
    > answer for it?

    The short answer is that the extended Cyrillic characters
    in question use diacritics that are mostly various distortions
    of the base letterforms (the descender ticks and the various
    hook forms) or involve bars across letter strokes. Long ago
    it was decided that it would not be a good idea to extend
    formal character decomposition to such base letterform shape
    changes or bars across letters. (Note that Latin characters
    with bars: barred-b, barred-d, barred-i, barred-u, barred-l,
    and the like are also not decomposed formally. Similarly for
    Latin letters with hooks, and so on.)

    So formal canonical decompositions are almost entirely
    confined to separable, accent-like diacritics (acute,
    grave, diaeresis, and so on). The only significant exceptions are
    the cedilla and ogonek, which attach smoothly to letter
    bottoms without otherwise distorting them, and which
    often have graphic alternates that are, indeed, separated
    diacritics (comma-like and reverse-comma-like forms).


    This archive was generated by hypermail 2.1.5 : Mon Feb 10 2003 - 23:24:49 EST