Re: New RFC 4645-4647 (language tags)

From: Doug Ewell (
Date: Tue Sep 12 2006 - 02:56:39 CDT

  • Next message: Mark E. Shoulson: "Re: New RFC 4645-4647 (language tags)"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    >> Please read the deeper explanation of this at the official ISO 639-3
    >> Web site, and then read RFC 4646 again, and come back when you are
    >> more familiar with the underlying concepts.
    > You didnot need to explain that, I know all that. I don't see why you
    > want to contradict me there, given that I just said I was surprised to
    > see that ISO 4646 provisions are made for future (unknown) extensions
    > of ISO 639 (with 4-letter codes, but there's not been any work or
    > draft there for such codes), but no provision at all for 3-letter
    > codes that will soon be introduced by ISO 639-3. Instead, ISO 4646
    > just speaks about the existing SIL codes (different from those in the
    > ISO 639-3 draft) and puts them in the private-use "x-*" subtags (and I
    > don't think that ISO 639-3 will be made to be in private use).

    I'm not trying to be rude, but I had already stated that:

    (a) RFC 4646 could not refer normatively to ISO 639-3 since that
    standard was not yet released;

    (b) we are already in the process of updating RFC 4646 to deal
    specifically with ISO 639-3 once it is released.

    The SIL example used to illustrate private-use subtags was just an
    example. You can put anything there, as long as it is valid

    Also, judging from your comment above, you may not be aware that work is
    indeed being done on a new part of ISO 639 that will feature 4-letter
    code elements. This is ISO 639-6. The editor of that project is Debbie

    > And I have read the ISO 639-3 draft since long (it's much simpler than
    > ISO 4646).

    It covers less. :-)

    > But in fact there are macrolanguages or collections too in ISO 639-2
    > (you cite Chinese as a good example, but there are others like codes
    > for language families, or subfamilies grouping all languages of a
    > family not listed separately; and another good example in ISO 639-3 is
    > the various codes assigned to Arabic regional dialects, or the many
    > codes assigned to Quechua and aboriginal Amerindian languages, or
    > codes for Buryat, which are weakly coded and grouped in ISO 639-2
    > under the same code; they are not technically families but just
    > collections in ISO 639-2, and ISO 639-3 assign them this "collection"
    > semantic).

    Collections in ISO 639-2 and macrolanguages in ISO 639-3 are not at all
    the same thing.

    > Essentially, I see ISO 639-3 not as a separate standard, but as an
    > extension made to deprecate ISO 639-2 (even if ISO 639-2 is kept
    > because it will be compatible for most codes, except possibly for some
    > poor codes for "Other xxx languages" where xxx is a family).

    All of the ISO 639-* standards will belong to a single family, and each
    will have its particular use. No part truly deprecates another. There
    are some applications that make use of the ISO 639-2 collection codes.

    > Now the way we will manage language collections in language tags is
    > still unspecified: Will we codify the refined language only, or the
    > existing ISO 639-2 collection then the refined language code as a
    > language extension? This is still not specified in ISO 4646 (even in
    > the case of collections or macrolanguages that already exist in ISO
    > 639-2 and for which language extensions are used: "zh-min-nan" or the
    > ISO 639-3 single code?)

    Anything that deals specifically with ISO 639-3, which is not yet
    released, could not be specified in RFC 4646.

    RFC 4646bis will recommend that the most accurate language subtag be
    used, which means a subtag for a specific language is generally better
    than one for a collection.

    Macrolanguages, which are not collections, are handled differently.

    The tag "zh-min-nan", which was registered under RFC 3066, had to be
    grandfathered into RFC 4646 so that it would still be valid (like all
    RFC 3066 tags). It is not otherwise valid under RFC 4646 because no
    extended language subtags are defined, and it will not be otherwise
    valid under RFC 4646bis because the extlang "min" does not apply to
    Chinese. It will be deprecated in RFC 4646bis in favor of "zh-nan",
    which is valid.

    This thread is way off topic for the Unicode and Unicore lists. Please
    join the LTRU list and inquire there if you have additional questions.

    Doug Ewell
    Fullerton, California, USA
    RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Tue Sep 12 2006 - 03:01:21 CDT