From: Michael Everson (everson@evertype.com)
Date: Mon Apr 26 2004 - 18:50:43 EDT
Surely this LONG message belongs on the RFC 3066 list.
Come ON people.
At 09:25 -0700 2004-04-26, Peter Constable wrote:
>Mark:
>
>I really feel your usage of terminology here is unhelpful -- in very
>practical ways, unhelpful, because it makes it more difficult to get
>people to understand how to implement things in the right way.
>
>It may be that the application that most interests you is the naming of
>locales, but that does not change the fact that the notions of "locale"
>and "language" are different, and that the primary intent of RFC 1766
>and it's successors has always been identification of "languages", as
>the title and introduction to RFC 3066 indicate:
>
>"Tags for the Identification of Languages"
>
>"One means of indicating the language used is by labeling the
>information content with an identifier for the language that is used in
>this information content."
>
>Whether in your broad or narrow sense, a locale is an operational mode
>of a software application or of a software operating environment to
>provide culture-dependent tailoring.
>
>"Language" in the sense used by RFC 1766/3066 is a
>linguistically-related attribute of content, and a language identifier
>is used to label content to indicate that attribute, or to label
>resources (e.g. spelling checkers) that can appropriately be applied to
>that content. I think that's stated reasonably clearly in RFC 1766/3066
>
>One should also refer to RFC 2277, IETF Policy on Character Sets and
>Languages, which clearly distinguishes "language" tags and "locale"
>tags. In the IETF context, which is the context for RFC 1766/3066, those
>documents provide do *not* provide tags for locales; they provide tags
>for languages.
>
>
>> There is, as I have said, a perfectly reasonable, narrow sense of
>> locale which is essentially identical to what is captured by RFC 3066.
>
>But that does not mean that it's a good thing to refer to RFC 3066 tags
>as locale identifiers.
>
>> And in
>> practice, RFC 3066 is often used with that meaning. I don't see any
>need to deny
>> reality (at least not in this area ;-)
>
>I think you overstate actual practice: For many years, various software
>implementations have used combinations of ISO 639-1 language identifiers
>and ISO 3166 country identifiers joined with an underscore to create
>locale identifiers; e.g. "en_US". It was not until Microsoft's .Net
>Framework that locales ('CultureInfo' in that context) have been named
>using strings that *resemble* RFC 3066 tags -- and it needs to be
>pointed out that the namespace for CultureInfo.Name is not the same as
>the RFC 3066 namespace.
>
>It may be that you and some others have come to refer to RFC 3066 tags
>as "locale" (in some unspecified sense) identifiers, but that
>terminology certainly is not used by all. Indeed, as mentioned above, it
>is counter to IETF practice as described in RFC 2277.
>
>My contention is that it's unhelpful to refer to RFC 3066 as "locale"
>tags. I have no problem with *using* RFC 3066 to name certain locales,
>or to control the operational mode of software processes in certain
>contexts. But saying that RFC 3066 tags are "locale" tags is decidedly
>unhelpful in getting people to understand what are appropriate
>requirements of implementations. While you may have a conceptualization
>that distinguishes between "narrow" and "broad" senses of "locale",
>there are at least some software implementers (and I suspect this
>applies to most) that only know of "locale", without any distinction of
>subtypes. As a result, people inevitably will end up confusing
>namespaces for locales with the RFC 3066 namespace. My concern is that
>this will lead to problems of interoperation, and will potentially
>undermine RFC 3066.
>
>Consider a couple of situations. First, someone needs to define in their
>software a locale for (say) US English but we a 24-hour time format.
>Yes, that falls in your broad rather than narrow sense of locale, but
>there are lots of software implementers out there that don't know the
>difference. All they know is that someone they consider knowledgeable in
>i18n/g11n issues has referred to RFC 3066 tags as "locale tags". So,
>they decide to name their locale "en-US-24hr". Then they write software,
>or document their system leading others to write software, that inserts
>this name into contexts like xml:lang. We know they shouldn't do it, but
>they don't know that; and referring to RFC 3066 as "locale" tagging only
>encouraged them to do this. And once they've done it, it can become a
>problem that all of us have to work around.
>
>Secondly, consider Mongolian. Documents written in Mongolian using
>Mongolian script should be tagged (following the provisions of RFC
>3066bis) as "mn-Mong". There is no distinction to be made between
>whether these documents were written in Mongolia or in PRC. Therefore,
>there's no need to tag the documents as "mn-Mong-CN" or "mn-Mong-MN".
>But for software locales, this country distinction *is* important. So,
>if a software implementer names their locale "mn-Mong-MN" and then
>assumes they should insert that string into the accept-language header
>of an HTTP request, there's a better than fair chance content will not
>be returned according to what the user would prefer, because what they
>want is "mn-Mong", and that's how the content is tagged, but because the
>software implementer didn't understand that the intent of RFC 3066 and
>the requirements for locales are not the same, the request that was sent
>was overly specific.
>
>So, I will persist in trying to get people to understand that RFC 3066
>tags are not "locale" tags, and ask that you not perpetuate confusion
>that is out there.
>
>
>Peter
>
>Peter Constable
>Globalization Infrastructure and Font Technologies
>Microsoft Windows Division
-- Michael Everson * * Everson Typography * * http://www.evertype.com
This archive was generated by hypermail 2.1.5 : Mon Apr 26 2004 - 23:32:42 EDT