Re: Same language, two locales (RE: Locale string for

From: Michael \(michka\) Kaplan (
Date: Sat Sep 02 2000 - 03:41:21 EDT

And then if you look at the Windows platform, the supported languages are
assigned a locale ID, a number that is documented in Platform SDK as
containing information about language, country, and sorting information.

The locale ID can be used for many purposes, the most important being
collation. It also becomes a key value into the NLS database and can be used
to obtain all sorts of information relevant to the locale.

You can see the Windows 2000 LCID list (more get added for each version of
Windows) with a small sample of information that can be retrieved at:

(The list is in the ISO639-1/ISO3166-1 order, by popular request).

You can use MLang, the MultiLanguage object, to actually do all sorts of
mappings between LCIDs and other methods of referring to locales.


----- Original Message -----
From: "Doug Ewell" <>
To: "Unicode List" <>
Sent: Friday, September 01, 2000 9:11 PM
Subject: RE: Same language, two locales (RE: Locale string for

> /|/|ike Ayers <> wrote:
> > BTW, I've gotten confused during this thread over the naming of
> > country codes, etc. There are ISO specs, RFCs, POSIX specs (and
> > more?)... Is this information conveniently summarized anywhere so
> > that I may enlighten myself?
> Here's a convenient, if perhaps oversimplified, summary.
> The standard for two-letter language codes is ISO 639-1. There is also
> an ISO 639-2 (actually, there are two variants) that specifies three-
> letter language codes.
> The standard for two-letter country codes is ISO 3166-1, which also
> specifies collections of three-letter and numeric country codes. ISO
> 3166-2 specifies political subdivisions within a country.
> RFC 1766 describes a way to use ISO 639-1 and 3166-1 to create language
> tags for use on the Internet (e.g. in mail messages). A lowercase 639-1
> language tag can be followed by a hyphen and an uppercase 3166-1 country
> code to represent the concept of "language X as spoken in country Y."
> Unicode Technical Report #7, "Plane 14 Characters for Language Tags,"
> recommends a slight adaptation of the RFC 1766 approach (both codes are
> lowercase).
> RFC 1766 is currently being revised to allow three-letter (639-2), as
> well as two-letter (639-1), language codes. This will permit the use
> of language tags for hundreds of less-common languages that have no two-
> letter code. The revision will also provide ways to use 3166-2 country-
> subdivision codes and (draft) ISO 15924 script codes in language tags.
> Naturally, the revised version will not be called RFC 1766, but will be
> assigned a new number. I don't know if UTR #7 will be updated to refer
> to the new RFC when it is published (I think it should be).
> POSIX locale names are also formed from 639-1 language codes and 3166-1
> country codes. Unlike in RFC 1766, the elements are separated by an
> underscore rather than a hyphen. POSIX uses this language/country code
> to represent not only the language and local dialect, but all the
> attributes of a locale setting, such as decimal separator, thousands
> separator, currency symbol, default date format, etc. It is widely
> regarded as inadequate for covering even a reasonable subset of locale
> possibilities.
> There are other standards for language and country codes, but for our
> purposes these are by far the most common.
> -Doug Ewell
> Fullerton, California

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:13 EDT