New Locale Proposal

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Fri Sep 15 2000 - 22:11:08 EDT


I am working on a new locale proposal. This seems like a great time to jump
in. Un fortunately I have not been able to pull the mail archives to go
back over the latest discussions so I will have missed much.

The locale will consist of three parts:

1) A modified lower case RFC 1766bis language

2) An ISO 3166 country code

3) A variant

The three parts are separates with underscores to distinguish the '-'
separators within each of the parts.

From RFC 1766bis

  The primary tag must be:
        An ISO 639 2-letter language code
        An ISO 639-2 3-letter language code
        i-
        x-

  The first subtag when following a 2-letter or 3-letter code is
distinguished as follows:
        If 2-letter, it is an ISO 3166-1 country code
        If 3-letter, it is an ISO 639-2 language code
        If 4-letter, it is an ISO/DIS 15924 script code
        If 5-8 letters, it may be of any value
  The first subtag when following I- or x- may have 1-8 letters and
represent any value.

  The second subtag is distinguished as follows:
        If 2-letter, it is an ISO 3166-2 region code
        If 3-letter, it is an ISO 639-2 language code
        If 4-letter, it is an ISO/DIS 15924 script code
        If 5-8 letters, it may be of any value

  Subsequent subtags may have any value.

The modifications to RFC 1766bis to make to better suited for locales are as
follows:

1) Normalize to single form when possible. Use ISO 639-1 code instead of
639-2 if one exists. (eng-us -> en_US) Use single language designation
rather than language/variant e.g. (no-nynorsk -> nn_NO) Replace obsolete
values with new codes. (jw -> he) Replace i-codes with SIL codes except
klingon.

2) Country codes that are part of RFC1766 become locale country codes.
en-us -> en_US

3) Variants that are not related to language are locale variants.
fr_FR_EURO

4) Break the "Applications should always treat a language tag as a single
token" rule by specifying to forms of RFC1766 languages. An ISO 639-1 or
639-2 or a 639-1/639-2 sub language pair. In the case of the 639-1/639-2
pair the program will iterate its resources. e.g. zh-hak_CN You certainly
do not want to replicate the entire Chinese language resources for Hakka.
Taiwanese Hakka would end up being zh-hak_TW. Total replacement 639-2
languages are specified without a sub language like haw_US.

5) Convert all non-human locales "C" & "POSIX" to human locales e.g. en_US.

Carl



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT