Re: New RFC 4645-4647 (language tags)

From: Philippe Verdy (
Date: Mon Sep 11 2006 - 02:51:27 CDT

  • Next message: Andrew West: "Re: Blog roll for Indic"

    From: "Doug Ewell" <>
    >> * RFC 4646 (BCP 47, obsoletes RFC 3066) : Tags for Identifying
    >> Languages

    > Not on the public Unicode list. They were announced on the UnicoRe
    > (members) list a few days ago. There was a brief delay between the RFC
    > Editor's announcement and the time the RFCs were actually available
    > online.
    > They're not a Unicode project, of course, but people on this list who
    > are interested in internationalization issues in general will probably
    > be interested.

    RFC 4646 is really a magnifical construction, given the complexity of preserving the compatibility with the legacy, and the role of the various registration agencies). I had already seen the first beta version of the ILSR on the IANA website, but now this model brings a clear understanding about how to manage language tags, and alone, it solves most of the problems caused by "equivalent" codes and how to canonicalize them.

    A very long reading, with many subtle details. Let's hope that the IANA registry will now be fully completed (notably the "Suppress-Script:" field which is still often missing for many obvious languages, or "Description:" whose value should match the other standards, or "Preferred-Value:" mappings for equivalences)

    What is surprising me is that RFC 4646 has defined reserves for future ISO 639 extensions, but only with 4-letter codes:
    * 3 letter-codes are explicitly restricted _only_ to ISO 639-2,
    * but not for ISO-639-3 (that extends the set of 3 letter codes, in such a way that most of the new 3-letter ISO 639-3 codes won't be usable as primary language subtags).
    So RFC4646 is already almost deprecating the now very advanced ISO 639-3 ongoing work (whose core text was already adopted long before RFC4646, even though the associated database is still in beta stage), just a few months before it gets finalized (and applications of ISO 639-3 are already being developed and deployed: how will those applications be compatible now with the new RFC 4646 ???

    I thought that there should have been provisions kept in RFC4646 for compatibility with ISO 639-3. But with the current RFC text, the new 3-letter ISO-639-3 codes will be usable with RFC 4646 only as language extension subtags (after another generic ISO639-2 or ISO 639-1 language subtag), unless these new ISO 639-3 codes are later imported into ISO 639-2!

    This archive was generated by hypermail 2.1.5 : Mon Sep 11 2006 - 02:58:57 CDT