Re: A proposed change of name for Latin Small Letter THwithStrikethrough

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Mar 06 2004 - 10:59:38 EST

  • Next message: Ernest Cline: "Re: A proposed change of name for Latin Small Letter TH with Strikethrough"

    From: "Peter Kirk" <peterkirk@qaya.org>
    > > Sindhi *does* have a distinction between two "kaf" characters, as it
    > > writes unaspirated /k/ and aspirated /kh/ with distinct characters
    > > (not using a digraph for /kh/, as Urdu does). The plain /k/ is written
    > > with a "swash kaf" form (see U+06AA), while /kh/ is written with a
    > > "keheh" form (U+06A9). So there is a clear need for a plain-text
    > > distinction between two "kaf" letters in Sindhi.
    >
    >
    > Now I know that "swash kaf" is not the same as kaf, so the situation is
    > not as simple as I had remembered. But the point remains that letters
    > which seem to be graphical variants in one language may in fact be
    > distinct letters in another language. That is one good reason to avoid
    > hasty unification of characters. I don't think it applies to your
    > various th ligatures, but then there could well be a dictionary out
    > there somewhere which uses one of your supposedly equivalent ligatures
    > for the voiced th and another one for the unvoiced th.

    Don't we have a similar (but reversed) issue with the "oe" ligature in French,
    where it is considered a glyphic variant (normally mandated by the correct
    French typographic rules, which are sometimes considered also as orthographic)
    of the two letters "o" and "e", where other languages consider "oe" as a
    separate distinct letter?

    Unicode chose not to unify "oe" with existing two letters, even if its the
    normal presentation form for French when these two letters are written
    side-by-side, unless there's an accent on the e (for example "coéquipier"), and
    a few exceptions like "coexister", "coefficient", and "coercitif" which
    historically where written with a "tréma" (diaeresis) on the e to avoid this
    ligature; modern French has mostly removed nearly all such "tréma" except in
    words like "Noël", simply because the ligature is now only used to ligate "o"
    with "eu" where the "o" is assimilated within the "eu" which is just pronounced
    longer as in "coeur" or "boeuf" or "choeur". So the tréma remains in "Noël",
    "Joël", "Boël" which tends now to be written in some places (notably in proper
    names) with a grave accent or by adding a separation with an unvoiced "h".

    Same problem with the "ae" ligature in French which is the normal form for "a"
    plus "e" without an accent, and is just pronouced as a long "é" with the Roman
    Latin pronunciation (where the "a" is assimilated with the following "e"
    pronounced "é" when it is a Roman Latin word read in French).

    One difficulty of desunifying characters is that it creates an additional
    semantic or orthographic difference which does not occur in one language, but
    may exist in other languages. So desunifying creates cases where additional
    collation rules are needed to correctly represent the language where the Unicode
    distinction is not significant (here for French "oe", but tomorrow as well for a
    "swash kaf" variant of "kaf" in non-Sindhi languages).

    Are there similar issues with scripts that contain a lot of letter form variants
    (notably Arabic), which may be considered as equivalent in most cases but
    distinct in other cases ? I know about the case of Devanagari, and how it was
    solved by encoding explicitly format characters to control the letter form in
    grapheme clusters.



    This archive was generated by hypermail 2.1.5 : Sat Mar 06 2004 - 11:31:51 EST