Re: Exemplar Characters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Nov 17 2005 - 06:17:12 CST

  • Next message: Otto Stolz: "Re: Exemplar Characters"

    From: "Antoine Leca" <Antoine10646@leca-marti.org>
    > We certainly should NOT mix the primary collating order (A to Z for both
    > English, German, Italian, and French), which is what is used for
    > directory,
    > with those exemplar characters we are talking about here.

    Certainly! But even for a single language with the appropriate tailored
    collation, the primary order does not tell isolately whever two letters are
    the same or not. They are different as soon as they have another difference
    than just the tailored case folding (generally this is the unification
    secondary collation level, but this depends on how a collation is tailored
    for a language, which may have additional collation levels with higher
    priorty than the case difference, and some languages may consider case
    difference as meaning distinct letters, so that conversion to uppercase or
    lowercase or titlecase would change the meaning and orthographic rules).

    I know that collation can be tailored for a given language, but can a locale
    specify which collation level is used for case difference, or if case
    difference is significant? May be I have read that in the past, but I can't
    remember exactly. If a locale specifies this, then we have a good way to
    determine which letters or polygraphs should be listed as distinct and
    necessary (or recommanded) in examplar and auxiliary characters: effectively
    we can then list only lowercase letters for most languages (because
    uppercase letters are case folded to the the same class in that language),
    and we can elimininate from those lists all variants that collate equally
    (except for the last code point level).

    So examplar and auxiliary characters could be checked automatically and
    simplified: the other differences are either case difference, or code point
    differences.

    For Breton, it would be sufficient to tailor for example the collation table
    so that the ASCII quote and the right curly quote collate together; in that
    case we only need to list in examplar characters the recommanded trigraph {c'h}
    and not {c'h} that collate equivalently, and not {C'h} which onky has a case
    difference ignorable for examplar characters. This solves the ambiguity and
    simplifies the problem.



    This archive was generated by hypermail 2.1.5 : Thu Nov 17 2005 - 06:19:23 CST