Re: are Unicode codes somehow specified in official national linguistic literature ? (worldwide)

From: Mark Davis (
Date: Wed Jun 14 2006 - 20:47:16 CDT

  • Next message: John Hudson: "Re: Glyphs for German quotation marks"

    Actually, the cases discussed do involve fallback from one family to
    another, such as from (a particular kind of) Sami to Norwegian, or Breton to
    French. We had quite a number of discussions about this, and finally
    concluded that we should not build the fallback in at a low level, since the
    particular fallback may very well be an individual preference, but that the
    best we should do would be to have a structure that suggested a default
    fallback for the locale, which could be overridden according to the user


    On 6/14/06, Philippe Verdy <> wrote:
    > I already submitted some remarks there, but it's been a long time, and
    > the CLDR has evolved (as well as the ICU library) and my initial comments
    > may look outdated regarding the new developments.
    > But this bug repport is not really discussing the fallback mechanism from
    > one language to a language family, but from a variant to a language, or the
    > fallback for languages that have multiple codes or legacy codes (he/iw,
    > in/id) as seen in Java VMs where the legacy codes (like iw, in) are still
    > the only one working given that it preserves the compatibility of old
    > applicaitons that depended on them for finding their resources with the
    > standard class loader of Java 1.3/1.4 (and even 5.0).
    > I still hope that the successor of RFC 3066 will come soon to describe
    > correctly the new locale identifiers (and especially the new ISO 15924 field
    > for the indication of scripts).
    > But gien that ISO 639-3 is still not finalized, it will be hard to find a
    > definitive solution for designating locales and all their known aliases, and
    > still preserve the compatibility of legacy applications depending on these
    > identifiers.
    > ICU for now proposes a temporary solution for resolving the resource
    > fallback path, but it certainly requires more thoughts to handle all
    > possible cases (and the interaction of language identifiers with ISO 3166
    > country/region identifiers, or the new aliases introduced now by deprecating
    > the ISO 3166 country/region identifiers in favor of more precise ISO 639-3
    > language identifiers);
    > The current locale fallback mechanism implemented in legacy applications
    > is most often fixed and various systems use different fallback algorithms to
    > determine alternate locales. In Java for example, this mechanism also
    > interacts not only with the user settings, but also with the local system
    > settings, when no user locale matches with a given resource id. But there's
    > still no way in Java to go after the first field of the locale id, as its
    > parent is a single root, and not another locale.
    > Even the java Locale class still does not include a constructor to specify
    > the script identifier (one could specify it in the variant identifier, but
    > its place at the third position after the country identifier is not the best
    > one for correct locale resolution, as this should be on the second
    > place between the language code and the region code). If one uses the field
    > normally reserved for the country to set the script code, it won't interact
    > cleanly with legacy applications that use country codes.
    > So one must use its own class cloader, using its own fallback mechanism,
    > and create a new class to extend the Locale object, and implement variuous
    > tricks to make it work with the standard locale interface. This is more or
    > less what ICU does to support extended locale identifiers and aliases.
    > ----- Original Message -----
    > *From:* Mark Davis <>
    > *To:* Philippe Verdy <>
    > *Cc:* Erkki Kolehmainen <> ; Cristian Secară<>;
    > *Sent:* Wednesday, June 14, 2006 9:08 PM
    > *Subject:* Re: are Unicode codes somehow specified in official national
    > linguistic literature ? (worldwide)
    > There is a planned mechanism: see
    > (This was planned for 1.4, but delayed since we didn't have enough data to
    > warrent adding the mechanism.)

    This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 21:03:48 CDT