Re: Exemplar Characters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Nov 14 2005 - 13:09:26 CST

  • Next message: Mark Davis: "Re: Exemplar Characters"

    From: "Mark Davis" <mark.davis@icu-project.org>
    > We use NFC for the exemplar character set. Any significant character
    > sequence can be included as well. For example, one can have [a-h {ch}
    > i-z], which indicates that {ch} is treated as a unit.

    What about apostrophes? They are present in some sigraphs and trigraphs for
    some languages.

    For example, Breton has {ch} and {c'h} but no isolated {c}. One problem is:
    howcan we represent the various ways to encode the apostrophe: {c'h} is
    frequent for Breton (using the ASCII single quote), but the correct code
    should be {c’h} (using the upper-comma apostrophe).

    How can we say that these two should be treated equivalently. May be this
    one {c['’]h} ???

    Although the correct form should be with the apostrophe, the ASCII quote is
    MUCH more frequent (also true for French and English, however the apostrophe
    plays another role and they don't create unbreakable digraphs/trigraphs like
    in Breton where the three characterssequence is a SINGLE letter...)

    There are similar examples in other languages, like {’n} or {'n} for which
    there also exists a combined character in Unicode...

    Also in Greek, {’Ε} is interpreted as capital Epsilon with tonos which can
    also be found encoded as a spacing tonos character before the capital letter
    {΄Ε} or often with an apostrophe or single quote followed by Epsilon...

    Philippe.



    This archive was generated by hypermail 2.1.5 : Mon Nov 14 2005 - 19:13:35 CST