Re: Common Locale Data Repository Project

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Apr 24 2004 - 11:26:46 EDT

  • Next message: Peter Kirk: "Re: Variation selectors and vowel marks"

    From: "Peter Constable" <petercon@microsoft.com>
    > > For now, the only workable solution to solve these issues is found in
    > > supplementary libraries in ICU which support locale aliases. (Yes I
    > > use the terme Locale because this is the term that Java gives to this
    > > identification,
    >
    > NO. That is the term Java (and other things) give to a *different*
    > identification. There are languages, there are cultures/locales. The two
    > are not the same.

    Then there will remain a problem in Java locales, unless the Java community
    accepts that the language part of a locale will contain will the language
    subtags of RFC 3066 or its successor, so that the API can implement a language
    resolver for that part only, ignoring the second and third parameter that will
    be used only to specify other (non-language) elements of a Locale.

    For now it's well known that if you create a Java application with resources
    bundles for Hebrew, you have to use the "iw" language parameter to name your
    bundle; if you use "he", then the same properties file or class part of a bundle
    will not be found on a OS that the Java runtime determines as supporting the
    "iw" locale, and the application will then display only the default locale (most
    often English). Note that Hebrew is part of the set of fully supported languages
    in Java. I doubt that the JRE will be changed to use now the "he " code by
    default as long as the locale resolver in Java is not updated to use a more
    clever algorithm than just equality of language codes.

    Same problem for the Simplified Chinese language: Java supports it natively only
    with the "TW" country code separately from the "zh" language code. If things
    must change later, the Java runtime should learn to work with a "zh-Hant"
    language identifier to be used in every country where the language is used.
    Using "zh_TW" (i.e. a separate "zh" language code and the separate "TW" country
    code) has the bad effect of also applying other locale standards appriate only
    for Taiwan, but not for Macau, Hong Kong, Singapore, the Reunion and other
    Indian Ocean, South Asian and South African countries or territories where this
    language is used with other national locale conventions (currenty, time and
    numeric formats, phone numbers...)

    In fact I would like to see that "Traditional" and "Simplified" Chinese are
    distinct languages in the same family. And an application would better use "zht"
    and "zhs" language codes to make the distinction, so that "zh" would become an
    identifier for a family of Han-written languages, rather than a language
    identifier, and so a legacy code. This means also changes in the Locale resolver
    so that a OS and user locale which indicates "zhs" or "zht" will first look for
    resources marked with their respective language code, and later will attempt to
    use a "zh" resource if not found.

    A Locale resolver should be able to determine, from each properties or class of
    a bundle, which codes it may support, and a degree/priority of matching face to
    other localized resources. But I have not seen anything that suggests that an
    application may be able to provide such Locale resolver; for now each
    application has to write its own resolver to map a user locale to a matching
    application-defined supported locale. The automatic resolver in Java (but other
    systems like POSIX have the same caveats) seem quite ill, as well as the
    resolution order (a bit more general) currently suggested in RFC 3066 which is
    exactly what was implemented in Java...



    This archive was generated by hypermail 2.1.5 : Sat Apr 24 2004 - 11:51:45 EDT