Re: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository Project

From: Mark Davis (
Date: Tue Apr 27 2004 - 16:10:34 EDT

  • Next message: Peter Kirk: "Re: Defined Private Use was: SSP default ignorable characters"

    You do make some good points -- but I still disagree ;-)

    Sorry for not answering earlier -- I've been a bit swamped. Will try to get time
    soon to reply.

    ► शिष्यादिच्छेत्पराजयम् ◄

    ----- Original Message -----
    From: "Peter Constable" <>
    To: "Unicode List" <>
    Sent: Mon, 2004 Apr 26 09:25
    Subject: RFC 3066 tags vs. locales (was RE: Common Locale Data Repository

    > Mark:
    > I really feel your usage of terminology here is unhelpful -- in very
    > practical ways, unhelpful, because it makes it more difficult to get
    > people to understand how to implement things in the right way.
    > It may be that the application that most interests you is the naming of
    > locales, but that does not change the fact that the notions of "locale"
    > and "language" are different, and that the primary intent of RFC 1766
    > and it's successors has always been identification of "languages", as
    > the title and introduction to RFC 3066 indicate:
    > "Tags for the Identification of Languages"
    > "One means of indicating the language used is by labeling the
    > information content with an identifier for the language that is used in
    > this information content."
    > Whether in your broad or narrow sense, a locale is an operational mode
    > of a software application or of a software operating environment to
    > provide culture-dependent tailoring.
    > "Language" in the sense used by RFC 1766/3066 is a
    > linguistically-related attribute of content, and a language identifier
    > is used to label content to indicate that attribute, or to label
    > resources (e.g. spelling checkers) that can appropriately be applied to
    > that content. I think that's stated reasonably clearly in RFC 1766/3066
    > One should also refer to RFC 2277, IETF Policy on Character Sets and
    > Languages, which clearly distinguishes "language" tags and "locale"
    > tags. In the IETF context, which is the context for RFC 1766/3066, those
    > documents provide do *not* provide tags for locales; they provide tags
    > for languages.
    > > There is, as I have said, a perfectly reasonable, narrow sense of
    > > locale which is essentially identical to what is captured by RFC 3066.
    > But that does not mean that it's a good thing to refer to RFC 3066 tags
    > as locale identifiers.
    > > And in
    > > practice, RFC 3066 is often used with that meaning. I don't see any
    > need to deny
    > > reality (at least not in this area ;-)
    > I think you overstate actual practice: For many years, various software
    > implementations have used combinations of ISO 639-1 language identifiers
    > and ISO 3166 country identifiers joined with an underscore to create
    > locale identifiers; e.g. "en_US". It was not until Microsoft's .Net
    > Framework that locales ('CultureInfo' in that context) have been named
    > using strings that *resemble* RFC 3066 tags -- and it needs to be
    > pointed out that the namespace for CultureInfo.Name is not the same as
    > the RFC 3066 namespace.
    > It may be that you and some others have come to refer to RFC 3066 tags
    > as "locale" (in some unspecified sense) identifiers, but that
    > terminology certainly is not used by all. Indeed, as mentioned above, it
    > is counter to IETF practice as described in RFC 2277.
    > My contention is that it's unhelpful to refer to RFC 3066 as "locale"
    > tags. I have no problem with *using* RFC 3066 to name certain locales,
    > or to control the operational mode of software processes in certain
    > contexts. But saying that RFC 3066 tags are "locale" tags is decidedly
    > unhelpful in getting people to understand what are appropriate
    > requirements of implementations. While you may have a conceptualization
    > that distinguishes between "narrow" and "broad" senses of "locale",
    > there are at least some software implementers (and I suspect this
    > applies to most) that only know of "locale", without any distinction of
    > subtypes. As a result, people inevitably will end up confusing
    > namespaces for locales with the RFC 3066 namespace. My concern is that
    > this will lead to problems of interoperation, and will potentially
    > undermine RFC 3066.
    > Consider a couple of situations. First, someone needs to define in their
    > software a locale for (say) US English but we a 24-hour time format.
    > Yes, that falls in your broad rather than narrow sense of locale, but
    > there are lots of software implementers out there that don't know the
    > difference. All they know is that someone they consider knowledgeable in
    > i18n/g11n issues has referred to RFC 3066 tags as "locale tags". So,
    > they decide to name their locale "en-US-24hr". Then they write software,
    > or document their system leading others to write software, that inserts
    > this name into contexts like xml:lang. We know they shouldn't do it, but
    > they don't know that; and referring to RFC 3066 as "locale" tagging only
    > encouraged them to do this. And once they've done it, it can become a
    > problem that all of us have to work around.
    > Secondly, consider Mongolian. Documents written in Mongolian using
    > Mongolian script should be tagged (following the provisions of RFC
    > 3066bis) as "mn-Mong". There is no distinction to be made between
    > whether these documents were written in Mongolia or in PRC. Therefore,
    > there's no need to tag the documents as "mn-Mong-CN" or "mn-Mong-MN".
    > But for software locales, this country distinction *is* important. So,
    > if a software implementer names their locale "mn-Mong-MN" and then
    > assumes they should insert that string into the accept-language header
    > of an HTTP request, there's a better than fair chance content will not
    > be returned according to what the user would prefer, because what they
    > want is "mn-Mong", and that's how the content is tagged, but because the
    > software implementer didn't understand that the intent of RFC 3066 and
    > the requirements for locales are not the same, the request that was sent
    > was overly specific.
    > So, I will persist in trying to get people to understand that RFC 3066
    > tags are not "locale" tags, and ask that you not perpetuate confusion
    > that is out there.
    > Peter
    > Peter Constable
    > Globalization Infrastructure and Font Technologies
    > Microsoft Windows Division

    This archive was generated by hypermail 2.1.5 : Tue Apr 27 2004 - 17:00:03 EDT