Re: three questions about alphabet files at Michael Everson site

From: Mark Davis (mark.davis@icu-project.org)
Date: Sat Nov 12 2005 - 13:24:40 CST

  • Next message: David Faulks: "Some Missing Astrological Symbols"

    Logically speaking, the set of characters used by a language is a quite
    fuzzy, so there isn't really a black and white answer (see also
    http://www.unicode.org/draft/reports/tr36/tr36.html#Language_Based_Security).

    What we ended up doing in CLDR was having a core set of characters for a
    language (the 'exemplarCharacters'), plus an additional set of
    characters that would be seen in customary usage. For example, for
    English we have [a-z] in the main set, and [á à ă â å ä ā æ ç é è ĕ ê ë
    ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ß ú ù ŭ û ü ū ÿ] in the auxiliary set.
    (http://unicode.org/cldr/data/common/main/en.xml)

    For the language in question, the latter is derived from dictionaries
    and style guidelines for major publications in the language. We don't
    have this in place for all languages yet, but will be expanding coverage
    in the CLDR 1.4 release, so feedback is welcome.

    Mark

    Charles Levert wrote:

    >Thanks for the clarification, Marc.
    >Most of this is genuinely new to me.
    >
    >(Your MUA, unless this is the work of the mailing list
    >software, uses just 'Content-Type: text/plain' for an
    >actual content that's a mix of windows-1252 and UTF-8,
    >instead of converting everything to one IANA charset and
    >using 'Content-Type: text/plain; charset=foo'. Thus,
    >your message appears garbled, at least to my MUA, so I
    >will attempt to correct things in the citation below.)
    >
    >
    >* On Friday 2005-11-11 at 23:43:07 +0100, Marc Bruguières wrote:
    >
    >
    >>Charles Levert:
    >>
    >>
    >>>* On Thursday 2005-11-10 at 20:37:05 +0100, Chris Jacobs wrote:
    >>>
    >>>
    >>>>Charles Levert wrote:
    >>>>
    >>>>
    >>>>> maelström (mot néerlandais), also spelled malstrom
    >>>>>
    >>>>>
    >>>>I am dutch and I am rather surprised to see that maelström is a mot
    >>>>néerlandais.
    >>>>I always thought it was a scandinavian word.
    >>>>
    >>>>
    >>In its current spelling but it comes from
    >>“maalstroom” in Dutch, the old Dutch spelling asserted
    >>in 1595 was “maelstrom”. All from the Robert historique
    >>de la langue française.
    >>
    >>As you will no doubt know “ae” is an old Dutch
    >>spelling of “aa” (long a), often seen in the Flemish
    >>names in French speaking area (Schaerbeek in “French”
    >>around Brussels, or people's name like “Michel de
    >>Swaen”, or Jean Bart (“Jan Baert” in West-Flemish, he
    >>was from Dunkirk) the famous sailor who was so successful
    >>against first the Dutch, then the English).
    >>
    >>http://www.schaerbeek.org/
    >>http://www.flandres.net/fherry_fr.asp
    >>http://perso.wanadoo.fr/jean-bart/Accueil/Accueil.html
    >>http://seeten.univ-littoral.fr/q_jbart.htm
    >>
    >>“Maal/malen” means something in Dutch I believe
    >>(according to my French dictionary), what does “mael”
    >>mean in Norwegian”
    >>
    >>
    >>
    >>>So did I! :-)
    >>>I was blindly citing my trusty (?) old “petit Larousse illustré 1981”.
    >>>It may have been corrected since.
    >>>
    >>>
    >>No need, I think. My Oxford English Dictionary also
    >>ascribes it ultimately to Dutch.
    >>
    >>
    >>
    >>
    >>>>Is the ö supposed to be an o umlaut or an o diaeresis?
    >>>>
    >>>>
    >>>I don't know for sure, but the absence of another
    >>>vowel next to it makes me lean towards umlaut.
    >>>
    >>>
    >>
    >>Umlaut if it changes the sound (um-laut), that's the case here,
    >>I think. Although I don't know any Scandinavian language.
    >>
    >>
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Nov 12 2005 - 13:26:25 CST