Locale vs. Language Tagging [Re: CJK tags - Fish or cut bait]

From: Glenn Adams (glenn@spyglass.com)
Date: Sun Jun 22 1997 - 21:17:04 EDT

You have been using the term "script" in the context of Apple's WorldScript
system. As I'm sure you know (though others may not), this use of the term
"script" is different from that employed by the Unicode Standard or the UTC.
Your use of the term "script", as used in the context of WorldScript, is much
closer to the "locale" concept employed in Unix and other environments. As
such, it implies a particular character encoding, a language, collation order,
input method(s), regional preferences, etc. So, I would claim that you are
essentially asking for locale tagging, as your examples clearly indicated (e.g.,
Simplified vs. Traditional Chinese); particularly since you want to use these
tags to map to the Apple concept of "locale".

The term "script" as used in the Unicode context is simply a set of characters,
independent of their encoding, and independent of the language(s) which employ
these characters in their written representation(s).

Before we (either the UTC or IETF) runs off and standardizes a mechanism for
language tagging, I suggest we spend some time seriously evaluating the need
to distinguish language and locale tags and whether any proposed mechanism should
provide adequate coverage for both of these requirements.

I know that you (and others) may overload language tags with locale tag semantics.
But I my initial thoughts on this matter is that this would be undesirable and
have unanticipated side-effects in the long term usage of this mechanism. Here
I quote from Ken Whistler's excellent summary of language tagging issues in his
message of Fri, 20 Jun 1997 17:41:30 -0700 (PDT) to <unicore@unicode.org> entitled
"White Paper on Language Tagging".

>2b. Two-part string value
> These are most often seen as a combination of a language
> code from ISO 639 and a country code from ISO-3166, now
> widely used as standard identifiers for "locales" in
> XPG4 and other contexts. Because they contain both a
> language code and a country code, strictly speaking these
> are not language tags--even though they are often
> treated as such. This has, in my opinion, occasioned massive
> confusion in the industry about the differences between
> language and locale support. (A good example is zh-TW
> vs. zh-CN. Both of these refer to the Chinese language,
> and most usually the Mandarin "dialect" of Chinese, but
> zh-TW is used to mark "Traditional Chinese", whereas
> zh-CN is used to mark "Simplified Chinese", with major
> implications for font resources, encoding, input
> methods, and other implementation details.)

Glenn Adams

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT