Re: Question about new locale language tags

From: Doug Ewell (
Date: Tue Dec 19 2006 - 23:21:43 CST

  • Next message: Arne Götje (高盛華): "Re: Question about new locale language tags"

    Arne Götje (高盛華) <arne at linux dot org dot tw> wrote:

    > According to
    > each of the 26 languages (22 of them living, 15 of them important)
    > have all 3 character language codes, for example 'nan' for Minnan,
    > 'hak' for Hakka, 'ami' for Amis, ...
    > Can we use these language codes to form new locales, like 'nan_TW',
    > 'hak_TW', 'ami_TW', etc.? Or does anything speak against this
    > practice?

    Normally it's recommended to wait until ISO 639-3 is published and then
    use those codes instead of the Ethnologue codes (which might not match
    100%). In the case of these languages, there happens to be 100%

    IETF language tags (not the same thing as locales) will most likely
    implement Hakka, Mandarin, Min Nan, and Tainwan Sign Language as
    "extended language subtags," meaning (for example) that Min Nan will be
    encoded in language tags as "zh-nan" instead of "nan." As a designer of
    locale information, you are probably free to use either the ISO 639-3
    code directly or the language tag, at least until a consensus develops
    to use one or the other.

    Michael Maxwell replied:

    > This is presumably ISO 639-2, which had many such problems.

    Considering the various Han Chinese languages as a single "Chinese
    language" is by no means a unique "problem" of ISO 639-1 and 639-2.
    Many people, including the majority of Chinese, share this view.

    Since ISO 639-3 incorporates all of the non-collective codes from ISO
    639-2, it includes "zho" as a "macrolanguage" encompassing the
    individual Han Chinese languages, while still retaining the status of a
    language itself. The question of whether Chinese is one language or
    several is a complex one, and usually not best understood by dismissing
    one view or the others as a "problem."

    What other "problems" of this sort are supposed to be present in ISO

    > ISO 639-3 is based on the Ethnologue codes (with some modifications),
    > plus codes for long extinct and made-up languages (including
    > everyone's favorite, Klingon).

    1. Encoding extinct languages is a design goal for ISO 639-3, not an

    2. All languages are "made-up"; they are human inventions and do not
    occur in nature. Constructed languages such as Esperanto and Ido are
    also present in ISO 639-1 and -2.

    3. Klingon is also present in ISO 639-2. It has more speakers than

    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Tue Dec 19 2006 - 23:24:54 CST