Re: Question about new locale language tags

From: Philippe Verdy (
Date: Sat Dec 23 2006 - 18:06:59 CST

  • Next message: Don Osborn: "2007 - Year of Unicode in Africa?"

    I didnot mix the new concept of "extlang" subtags (that come between the primary "lang" subtag and the "script" and "region" subtags), with the new concept of extended language subtags (that come at end of the full language tag, each one with a single-letter prefix, as described in RFC4646).

    I don't know why you think I have mixed the two concepts which are already clearly defined in RFC4646, where the formal syntax of "extlang" subtags is also defined unambiguously.

    I really meant the "extlang", as defined in the current release of RFC4646. And RFC4646 does not give any signification, for now, to "extlang", except that it indicatesit may be used along with the future release of ISO639-3 (but there's NO policy associated, so I still wonder of extlang subtags will really have to be unique or if theirsignification will depend on the primary lang subtag).

    So for me, according to RFC4646, the "zh-min" full language tag still means something different from the possible "min" full language tag (that may eventually be built using the "min" language code standardized later in ISO 639, provided that it is registered in the language tag registry).

    "Chinese" (zh) is already a macrolanguage and I don't think this will change (it won't change after ISO639-3 is released,becauseISO 639-3 won't assign any 2-letter codes, and the "RFC4646bis" draft does not add more meanings to existing macrolanguages.

    Using the existing RFC4646 canonicalization rules for language tags, you can't remove a extlang subtag from a full language tag like "zh-min-nan" to make it mean the same as "zh-nan", and you can't drop the primary language tag to make it like "nan" or "min", unless there's a registration for compatibility.

    Using the RFC4646 formal syntax, "min" and "nan" are separate extlang subtags in a language tag like "zh-min-nan", independantly of what the full tag means, and they can't be reordered after canonicalization.

    It is most probable that RFC4646bis will include registration requirement for extlang subtags, and this registration will include validity limits (similar to the validity rules for the orthographic year "variants" of German) which will define the context in which the registered extlang is valid. Ithink there will also be asort of "Suppress-Extlang:" field to allow language tags tonot specify the default ISO639-3 language code to which a macrolanguage tag is associated (for example "zh-cmn" may become equivalent to "zh" for meaning Mandarin, if the "cmn" extlang subtag is registered in the validity context of the "zh" language tag, and the "zh" registration is updated with an additional "Suppress-Extlang: cmn" registration field)

    But I really doubt that the "min" extlang lang subtag (coded from ISO639-3) will be suppressable in any context other than the context of a "min" primary lang tag if "min" is registered also as a possible primary lang tag, something that I really doubt, as it would make existing compatibility tags like "zh-min" in canonical form today, non canonical later; in fact if ISO639-3 language codes are allowed at primary lang level, then it wouldmost probably registered as a compatibility "lang" tag in non canonical form; regarding the case of "min",it isevident that such equivalence will be impossible, given that "min" designates another philippine language than the family of Min dialects in the Chinese macrolanguage).

    All this is speculation, I confess it, but compatibility is possible using such scheme which is already introduced in the existing released RFC4646.

    For now, I see absolutely no use of "extended" subtags for designating any language, but only for locale identifiers (for example to specify a currency, or a date format convention, or number formats):
    * extended subtags are designed today as a way to pack several locale tags into a single unordered list, whose items are qualified by a single letter in [a-hj-wyz], except the first item of the list which has no single letter qualifier and that encodes: the language itself (macrolanguage code in ISO 639-1/2, and optional extlang codes, or grandfathered IANA codes starting by "i-"), the ISO15924 script, the ISO3166 region code, and registered variant codes.
    * The last (optional) item in the list is the full string that follows the "x-" prefix for private user-assigned codes.

    So, what did I miss really?

    ----- Original Message -----
    From: "Doug Ewell" <>
    To: "Unicode Mailing List" <>
    Cc: "Philippe Verdy" <>; "Peter Constable" <>
    Sent: Friday, December 22, 2006 7:41 PM
    Subject: Re: Question about new locale language tags

    > Philippe,
    > You have missed the entire long thread of discussion in the Language Tag
    > Registry Update (LTRU) working group concerning extended language
    > subtags, what they are, and why they are that way. They are correlated
    > 1-to-1 with ISO 639-3 code elements that are encompassed by an ISO 639-3
    > macrolanguage. "min" has only one meaning in ISO 639-3, therefore it
    > has only one meaning in RFC 4646bis.
    > I suggest you read the LTRU archives, beginning at this URL, before
    > jumping to conclusions:
    > Be careful not to confuse "extended language subtags" (also known as
    > "extlangs") with "extension subtags." They are completely different.

    This archive was generated by hypermail 2.1.5 : Sat Dec 23 2006 - 18:10:33 CST