Re: =?iso-8859-1?Q?Re: Re: three questions about alphabet files at Michael Everson site?=

From: Jukka K. Korpela (
Date: Tue Nov 15 2005 - 14:11:02 CST

  • Next message: Michael Everson: "Re: Apostrophes (was Re: Exemplar Characters)"

    On Tue, 15 Nov 2005, =?iso-8859-1?Q?Marc Brugui=E8res?= wrote:

    >> There is the family name du Roscošt. This is the only example I can come
    >> up with.
    > I fail to understand why is this important for Unicode or truly
    > internationalized software?

    It is somewhat marginal, and I'm afraid this discussion has largely taken
    a wrong path. The first problem should not be the specification of
    character collections for different locales - this should take place in
    appropriate national and coŲperational fora - but to discuss
    a) which collections should be present in the CLDR (and perhaps as
    required/recommended/purely optional)
    b) how these collections are defined: what they mean and how they are

    Just knowing which characters appear in a given language is interesting
    trivia information. But how will it be used? Different uses may require
    different types of collections.

    > In real life, French (English, German, Arabic, etc.) texts will contain
    > many more characters than those in this list: all kinds of dashes,
    > quotes, symbols, not to mention mathematical symbols, texts from other
    > scripts, etc.

    Indeed. I see no reason to omit punctuation. In fact, punctuation is often
    more important than individual letters. When desired, collections limited
    to letters can, as needed, be formed from the primary data, e.g. simply by
    taking a collection and selecting characters with a General Category value
    that indicates a letter.

    > I thought Unicode was supposed to open up all characters to us, not
    > restrict us to small sets

    Well, actually, both. Conformance to the Unicode standard does not require
    support to all characters; in fact, an implementation might support a
    fairly limited repertoire. But most importantly, the collections of
    characters used in different languages help in many ways, e.g. in checking
    input data consistency, in selecting (non-Unicode) encodings that would be
    feasible, in selecting the fonts that can be used, in designing fonts,
    in considering which characters should be easily produced (when designing
    keyboard layout or input mechanisms in general), in text scanning, etc.

    But the different needs may imply need for different types of information.
    I'm afraid the classification to exemplarChars and auxiliary exemplarChars
    is too coarse (and the names are misleading).

    Jukka "Yucca" Korpela,

    This archive was generated by hypermail 2.1.5 : Tue Nov 15 2005 - 14:12:08 CST