Re: New RFC 4645-4647 (language tags)

From: Philippe Verdy (
Date: Tue Sep 12 2006 - 02:20:45 CDT

  • Next message: Doug Ewell: "Re: New RFC 4645-4647 (language tags)"

    From: "Doug Ewell" <>
    > Philippe,
    > ISO 639-3 will have a number of "macrolanguages," which are broad
    > linguistic concepts like "Chinese" that have more refined linguistic
    > concepts like "Wu" or "Hakka" or "Mandarin" underneath them.
    > Please read the deeper explanation of this at the official ISO 639-3 Web
    > site, and then read RFC 4646 again, and come back when you are more
    > familiar with the underlying concepts.

    You didnot need to explain that, I know all that. I don't see why you want to contradict me there, given that I just said I was surprised to see that ISO 4646 provisions are made for future (unknown) extensions of ISO 639 (with 4-letter codes, but there's not been any work or draft there for such codes), but no provision at all for 3-letter codes that will soon be introduced by ISO 639-3. Instead, ISO 4646 just speaks about the existing SIL codes (different from those in the ISO 639-3 draft) and puts them in the private-use "x-*" subtags (and I don't think that ISO 639-3 will be made to be in private use).

    And I have read the ISO 639-3 draft since long (it's much simpler than ISO 4646). I did not say that all ISO 639-3 codes had to be treated equally, or that it was defined as a partition of languages (like ISO 639-2 attempted to do).

    But in fact there are macrolanguages or collections too in ISO 639-2 (you cite Chinese as a good example, but there are others like codes for language families, or subfamilies grouping all languages of a family not listed separately; and another good example in ISO 639-3 is the various codes assigned to Arabic regional dialects, or the many codes assigned to Quechua and aboriginal Amerindian languages, or codes for Buryat, which are weakly coded and grouped in ISO 639-2 under the same code; they are not technically families but just collections in ISO 639-2, and ISO 639-3 assign them this "collection" semantic).

    Essentially, I see ISO 639-3 not as a separate standard, but as an extension made to deprecate ISO 639-2 (even if ISO 639-2 is kept because it will be compatible for most codes, except possibly for some poor codes for "Other xxx languages" where xxx is a family).

    Now the way we will manage language collections in language tags is still unspecified: Will we codify the refined language only, or the existing ISO 639-2 collection then the refined language code as a language extension? This is still not specified in ISO 4646 (even in the case of collections or macrolanguages that already exist in ISO 639-2 and for which language extensions are used: "zh-min-nan" or the ISO 639-3 single code?)

    This archive was generated by hypermail 2.1.5 : Tue Sep 12 2006 - 02:25:12 CDT