Re: CLDR

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Tue May 16 2006 - 02:32:40 CDT

  • Next message: Philippe Verdy: "Re: Win IE 7b2 and UTF-8"

    On Tue, 16 May 2006, Balasankar wrote:

    > Whether the union of Exemplar & auxiliary exemplar character set should
    > contain all the possible characters used in the particular language?

    No. It is impossible to list down the characters used in a language; the
    set is very fuzzy, with membership ranging from core characters (such as
    "a" in English) through marginal characters (like "", i.e. "e" with
    acute, in English) to characters may appear in special words, typically
    borrowings, perhaps _very_ rarely. Moreover, these sets are currently
    supposed to list down _letters_ only. The two sets make it possible to
    give a rather rough description of letters used in a language, and the
    choices made are often rather debatable.

    It isn't even clear what the intended _use_ of the sets is, or what the
    actual use will be. There is a large number of imagineable uses, with
    their own implications on what the grounds for defining the sets should
    really be. I'm afraid the (mostly implicit) criteria applied now make the
    sets incommensurable across languages.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Tue May 16 2006 - 02:42:08 CDT