RE: new version of BCP 47: language identifiers

From: Phillips, Addison (
Date: Thu Jun 25 2009 - 11:43:17 CDT

  • Next message: Venugopalan G: "Zero termination"

    Note: the registry will be updated first. It usually takes the RFC Editor awhile to get to publishing the draft, whereas the registry conversion will probably happen sometime in the next couple of weeks.


    Addison Phillips
    Globalization Architect -- Lab126

    Internationalization is not a feature.
    It is an architecture.

    From: [] On Behalf Of Mark Davis ?
    Sent: Thursday, June 25, 2009 8:51 AM
    To: Unicode
    Subject: new version of BCP 47: language identifiers

    The newest version of BCP 47 for language identifiers has just been approved, after a 3 year slog! I don't know how long it will be until it is published, which will involve:

     * the spec at being updated to (plus editing below), and
     * the registry at being updated to
    But people can start the ball rolling on various upgrades where needed. There is a simple utility on for going from language identifiers (language tags) to their components. We'll be also updating the next version of CLDR (see the draft at

    On Thu, Jun 25, 2009 at 07:16, The IESG <<>> wrote:
    The IESG has approved the following document:

    - 'Tags for Identifying Languages '
      <draft-ietf-ltru-4646bis-23.txt> as a BCP

    This document is the product of the Language Tag Registry Update Working

    The IESG contact persons are Alexey Melnikov and Lisa Dusseault.

    A URL of this Internet-Draft is:

    Technical Summary

     This document describes the structure, content, construction, and
     semantics of language tags for use in cases where it is desirable to
     indicate the language used in an information object. It also
     describes how to register values for use in language tags and the
     creation of user-defined extensions for private interchange.
     This document is an update of RFC4646. The main change is the
     addition of thousands of three-letter language subtags for languages
     for which tagging was not possible up to now. Also, the registry
     format and procedures were adjusted to deal with this change,
     and to reflect experience from current practice.

    Working Group Summary

     The WG process for this document was mostly smooth and revolving
     around details. There were some highly contentious issues, but
     for all of them, a solution was found that was acceptable to
     the involved parties and works for all scenarios identified.

    Document Quality

     The IANA Language Subtag Registry, and the language tags that can
     be formed according to this document and its predecessor, are widely
     used across the Internet to identify languages, both in implementations
     (code) and in a wide range of data.


     Martin J. Dürst is the document shepherd. Alexey Melnikov
     is the responsible AD.

    RFC Editor Note

     Please move the reference to RFC 2028 to the Informative section.

     The document has several references to BCP 47. RFC Editor
     should check if they are appropriate and how to represent them better.

     There are several cases of mismatched singulars and plurals
     in the document, so RFC Editor might want to check for these.

     Please replace the last paragraph of section 6 with 2 paragraphs:
      The registries specified in this document are not suitable for
      frequent or real-time access to, or retrieval, of the full registry
      contents. Most applications do not need registry data at all. For
      others, being able to validate or canonicalize language tags as of a
      particular registry date will be sufficient, as the registry contents
      change only occasionally. Changes are announced to
      <<>>. Changes, or the absence
      thereof, can also easily be detected by looking at the 'File-Date'
      record at the start of the registry, or by using features of the
      protocol used for downloading, without having to download the full

      The registries specified in this document are not suitable for
      frequent or real-time access to, or retrieval of, the full registry
                                                   ^ ^
      contents. Most applications do not need registry data at all. For
      others, being able to validate or canonicalize language tags as of a
      particular registry date will be sufficient, as the registry contents
      change only occasionally. Changes are announced to
      <<>>. This mailing list is
      intended for interested organizations and individuals, not for bulk
      subscription to trigger automatic software updates. The size of the
      registry makes it unsuitable for automatic software updates.
      Implementers considering integrating the Language Subtag Registry in
      an automatic updating scheme are strongly advised to distribute only
      suitably encoded differences, and only via their own infrastructure,
      not directly from IANA.

      Changes, or the absence thereof, can also easily be detected by
      looking at the 'File-Date' record at the start of the registry, or
      by using features of the protocol used for downloading, without
      having to download the full registry. At the time of publication of
      this document IANA is making the Language Tag registry available
      over HTTP 1.1. The proper way to update a local copy of the Language
      Subtag Registry using HTTP 1.1 is to use a conditional GET [RFC2616].

     Please add RFC 2616 to the list of Informative references.

     Please change Mark Davis's email address to<>.

     Please insert a new section 3.9 that reads:

    3.9. Applicability of the Subtag Registry

    The Language Subtag Registry is the source of data elements used to
    construct language tags, following rules described in this document.
    Language tags are designed for indicating linguistic attributes of
    various content, including not only text but also most media formats
    such as video or audio. They also form the basis for language and
    locale negotiation in various protocols and APIs.

    The registry is therefore applicable to many applications that need some
    form of language identification, with these limitations:

      - It is not designed to be the sole data source in the creation of a
    language selection user interface. For example, the registry does not
    contain translations for subtag descriptions or for tags composed from the
    subtags. Sources for localized data based on the registry are generally
    available, notably [CLDR]. Nor does the registry indicate which subtag
    combinations are particularly useful or relevant.

       - It does not provide information indicating relationships between
    different languages, such as might be used in a user interface to select
    language tags hierarchically, regionally, or on some other organizational

        - It does not supply information about potential overlap between
    different language tags, as the notion of what constitutes a language is
    not precise: several different language tags might be reasonable choices
    for the same given piece of content.

        - It does not contain information about appropriate fallback choices
    when performing language negotiation. A good fallback language might be
    linguistically unrelated to the specified language. The fact that one
    language is often used as a fallback language for another is usually a
    result of outside factors, such as geography, history, or culture--factors
    which might not apply in all cases. For example, most people who use
    Breton (a Celtic language used in the Northwest of France) would probably
    prefer to be served French (a Romance language) if Breton isn't available.

    Ltru mailing list<>


    This archive was generated by hypermail 2.1.5 : Thu Jun 25 2009 - 11:47:24 CDT