Re: Question about new locale language tags

From: Doug Ewell (
Date: Wed Dec 27 2006 - 01:52:24 CST

  • Next message: Doug Ewell: "Re: Question about new locale language tags"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > I didnot mix the new concept of "extlang" subtags...
    > I don't know why you think I have mixed the two concepts...
    > I really meant the "extlang", as defined in the current release of
    > RFC4646.

    Other people have confused these two similarly-named concepts often
    enough in the past that I usually make a point to mention them. This
    was not in response to anything in Philippe's post, and it wasn't meant
    to be the main focus of my reply.

    > So for me, according to RFC4646, the "zh-min" full language tag still
    > means something different from the possible "min" full language tag
    > (that may eventually be built using the "min" language code
    > standardized later in ISO 639, provided that it is registered in the
    > language tag registry).

    It cannot, because extlangs are derived from ISO 639 code elements just
    as primary language subtags are.

    > Using the existing RFC4646 canonicalization rules for language tags,
    > you can't remove a extlang subtag from a full language tag like
    > "zh-min-nan" to make it mean the same as "zh-nan", and you can't drop
    > the primary language tag to make it like "nan" or "min", unless
    > there's a registration for compatibility.

    "zh-min-nan" is a grandfathered tag. It is well-formed according to the
    syntax of RFC 4646, but it does not mean what the individual pieces
    imply. Trying to dissect it to determine the meaning of either the
    whole or the parts is fruitless.

    > Using the RFC4646 formal syntax, "min" and "nan" are separate extlang
    > subtags in a language tag like "zh-min-nan", independantly of what the
    > full tag means, and they can't be reordered after canonicalization.

    "zh-min-nan" is a grandfathered tag. This means it can only be treated
    as an indivisible whole. "min" and "nan" in this context are not
    separate extlang subtags, they are just substrings within the tag.

    > It is most probable that RFC4646bis will include registration
    > requirement for extlang subtags, and this registration will include
    > validity limits (similar to the validity rules for the orthographic
    > year "variants" of German) which will define the context in which the
    > registered extlang is valid. Ithink there will also be asort of
    > "Suppress-Extlang:" field to allow language tags tonot specify the
    > default ISO639-3 language code to which a macrolanguage tag is
    > associated (for example "zh-cmn" may become equivalent to "zh" for
    > meaning Mandarin, if the "cmn" extlang subtag is registered in the
    > validity context of the "zh" language tag, and the "zh" registration
    > is updated with an additional "Suppress-Extlang: cmn" registration
    > field)

    Since I have been working directly on RFC 4646 and RFC 4646bis for two
    and a half years now, and since I have heard no plans to create
    individually registrable extlang subtags nor to create a
    "Suppress-Extlang" mechanism, I'd be interested to hear why you believe
    it is "most probable" that these features will be added.

    > All this is speculation, I confess it, but compatibility is possible
    > using such scheme which is already introduced in the existing released
    > RFC4646.

    There is really no need to resort to speculation, since all of the
    discussions regarding RFC 4646bis take place in an IETF working group
    that is open to anyone and has publicly available mailing list archives.
    I don't really care if you choose to speculate, but others reading the
    Unicode mailing list may come away misinformed and that does concern me.

    > For now, I see absolutely no use of "extended" subtags for designating
    > any language, but only for locale identifiers (for example to specify
    > a currency, or a date format convention, or number formats):
    > * extended subtags are designed today as a way to pack several locale
    > tags into a single unordered list, whose items are qualified by a
    > single letter in [a-hj-wyz], except the first item of the list which
    > has no single letter qualifier and that encodes: the language itself
    > (macrolanguage code in ISO 639-1/2, and optional extlang codes, or
    > grandfathered IANA codes starting by "i-"), the ISO15924 script, the
    > ISO3166 region code, and registered variant codes.
    > * The last (optional) item in the list is the full string that follows
    > the "x-" prefix for private user-assigned codes.

    All of this is basically true except that grandfathered tags, including
    those that start with "i-", can't have other subtags added to them.

    > So, what did I miss really?

    The significance of grandfathered tags and the very simple,
    circumscribed rules concerning extlang subtags. There is no such thing
    assgined in the RFC 4646 era. In RFC 4646bis, they will be derived
    directly from ISO 639-3 code elements for languages that are encompassed
    by an ISO 639-3 macrolanguage. They are not individually registrable
    other than by appearing in ISO 639-3, and they have one and only one
    meaning, and no extlang can also be a primary language subtag. That is
    not speculation, it is fact.

    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Wed Dec 27 2006 - 01:57:16 CST