Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project

From: Michael Everson (
Date: Mon Apr 26 2004 - 18:50:43 EDT

  • Next message: Michael \(michka\) Kaplan: "Re: Proposal to add 2 Romanian characters"

    Surely this LONG message belongs on the RFC 3066 list.
    Come ON people.

    At 09:25 -0700 2004-04-26, Peter Constable wrote:
    >I really feel your usage of terminology here is unhelpful -- in very
    >practical ways, unhelpful, because it makes it more difficult to get
    >people to understand how to implement things in the right way.
    >It may be that the application that most interests you is the naming of
    >locales, but that does not change the fact that the notions of "locale"
    >and "language" are different, and that the primary intent of RFC 1766
    >and it's successors has always been identification of "languages", as
    >the title and introduction to RFC 3066 indicate:
    >"Tags for the Identification of Languages"
    >"One means of indicating the language used is by labeling the
    >information content with an identifier for the language that is used in
    >this information content."
    >Whether in your broad or narrow sense, a locale is an operational mode
    >of a software application or of a software operating environment to
    >provide culture-dependent tailoring.
    >"Language" in the sense used by RFC 1766/3066 is a
    >linguistically-related attribute of content, and a language identifier
    >is used to label content to indicate that attribute, or to label
    >resources (e.g. spelling checkers) that can appropriately be applied to
    >that content. I think that's stated reasonably clearly in RFC 1766/3066
    >One should also refer to RFC 2277, IETF Policy on Character Sets and
    >Languages, which clearly distinguishes "language" tags and "locale"
    >tags. In the IETF context, which is the context for RFC 1766/3066, those
    >documents provide do *not* provide tags for locales; they provide tags
    >for languages.
    >> There is, as I have said, a perfectly reasonable, narrow sense of
    >> locale which is essentially identical to what is captured by RFC 3066.
    >But that does not mean that it's a good thing to refer to RFC 3066 tags
    >as locale identifiers.
    >> And in
    >> practice, RFC 3066 is often used with that meaning. I don't see any
    >need to deny
    >> reality (at least not in this area ;-)
    >I think you overstate actual practice: For many years, various software
    >implementations have used combinations of ISO 639-1 language identifiers
    >and ISO 3166 country identifiers joined with an underscore to create
    >locale identifiers; e.g. "en_US". It was not until Microsoft's .Net
    >Framework that locales ('CultureInfo' in that context) have been named
    >using strings that *resemble* RFC 3066 tags -- and it needs to be
    >pointed out that the namespace for CultureInfo.Name is not the same as
    >the RFC 3066 namespace.
    >It may be that you and some others have come to refer to RFC 3066 tags
    >as "locale" (in some unspecified sense) identifiers, but that
    >terminology certainly is not used by all. Indeed, as mentioned above, it
    >is counter to IETF practice as described in RFC 2277.
    >My contention is that it's unhelpful to refer to RFC 3066 as "locale"
    >tags. I have no problem with *using* RFC 3066 to name certain locales,
    >or to control the operational mode of software processes in certain
    >contexts. But saying that RFC 3066 tags are "locale" tags is decidedly
    >unhelpful in getting people to understand what are appropriate
    >requirements of implementations. While you may have a conceptualization
    >that distinguishes between "narrow" and "broad" senses of "locale",
    >there are at least some software implementers (and I suspect this
    >applies to most) that only know of "locale", without any distinction of
    >subtypes. As a result, people inevitably will end up confusing
    >namespaces for locales with the RFC 3066 namespace. My concern is that
    >this will lead to problems of interoperation, and will potentially
    >undermine RFC 3066.
    >Consider a couple of situations. First, someone needs to define in their
    >software a locale for (say) US English but we a 24-hour time format.
    >Yes, that falls in your broad rather than narrow sense of locale, but
    >there are lots of software implementers out there that don't know the
    >difference. All they know is that someone they consider knowledgeable in
    >i18n/g11n issues has referred to RFC 3066 tags as "locale tags". So,
    >they decide to name their locale "en-US-24hr". Then they write software,
    >or document their system leading others to write software, that inserts
    >this name into contexts like xml:lang. We know they shouldn't do it, but
    >they don't know that; and referring to RFC 3066 as "locale" tagging only
    >encouraged them to do this. And once they've done it, it can become a
    >problem that all of us have to work around.
    >Secondly, consider Mongolian. Documents written in Mongolian using
    >Mongolian script should be tagged (following the provisions of RFC
    >3066bis) as "mn-Mong". There is no distinction to be made between
    >whether these documents were written in Mongolia or in PRC. Therefore,
    >there's no need to tag the documents as "mn-Mong-CN" or "mn-Mong-MN".
    >But for software locales, this country distinction *is* important. So,
    >if a software implementer names their locale "mn-Mong-MN" and then
    >assumes they should insert that string into the accept-language header
    >of an HTTP request, there's a better than fair chance content will not
    >be returned according to what the user would prefer, because what they
    >want is "mn-Mong", and that's how the content is tagged, but because the
    >software implementer didn't understand that the intent of RFC 3066 and
    >the requirements for locales are not the same, the request that was sent
    >was overly specific.
    >So, I will persist in trying to get people to understand that RFC 3066
    >tags are not "locale" tags, and ask that you not perpetuate confusion
    >that is out there.
    >Peter Constable
    >Globalization Infrastructure and Font Technologies
    >Microsoft Windows Division

    Michael Everson * * Everson Typography *  *

    This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 23:32:42 EDT