Exemplar Characters (was: Re: three questions about alphabet files at Michael Everson site)

From: Christopher Fynn (cfynn@gmx.net)
Date: Sun Nov 13 2005 - 22:52:17 CST

  • Next message: Philippe Verdy: "Re: Exemplar Characters (was: Re: three questions about alphabet files at Michael Everson site)"

    Mark Davis wrote:
    > Logically speaking, the set of characters used by a language is a quite
    > fuzzy, so there isn't really a black and white answer (see also
    > http://www.unicode.org/draft/reports/tr36/tr36.html#Language_Based_Security).
    >
    >
    > What we ended up doing in CLDR was having a core set of characters for a
    > language (the 'exemplarCharacters'), plus an additional set of
    > characters that would be seen in customary usage. For example, for
    > English we have [a-z] in the main set, and [á à ă â å ä ā æ ç é è ĕ ê ë
    > ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ß ú ù ŭ û ü ū ÿ] in the auxiliary set.
    > (http://unicode.org/cldr/data/common/main/en.xml)

    Mark

    Should the "exemplar characters" for a language include all the
    base+combining character *combinations* frequent in that language
    or - all the base characters and all the combining characters listed
    separately?

      - Chris

    > For the language in question, the latter is derived from dictionaries
    > and style guidelines for major publications in the language. We don't
    > have this in place for all languages yet, but will be expanding coverage
    > in the CLDR 1.4 release, so feedback is welcome.
    >
    > Mark
    >



    This archive was generated by hypermail 2.1.5 : Sun Nov 13 2005 - 22:54:25 CST