Re: Locale vs. Language Tagging [Re: CJK tags - Fish or cut bait]

From: Pete Resnick (presnick@qualcomm.com)
Date: Sun Jun 22 1997 - 22:59:03 EDT


On 6/22/97 at 8:17 PM -0500, Glenn Adams wrote:

>You have been using the term "script" in the context of Apple's WorldScript
>system. As I'm sure you know (though others may not), this use of the term
>"script" is different from that employed by the Unicode Standard or the UTC.

You're right. As I said in my message to Keld, I'll avoid this in the future.

>Your use of the term "script", as used in the context of WorldScript, is much
>closer to the "locale" concept employed in Unix and other environments. As
>such, it implies a particular character encoding, a language, collation order,
>input method(s), regional preferences, etc.

Actually, no, I think you've put too much into what Apple refers to as a
"script". Though it implies certain (sets of) input methods in WorldScript,
a script most certainly does not imply a particular language (though a set
of languages can often be implied: Yiddish and Hebrew are both part of the
Hebrew script; Urdu and Farsi are both in the Arabic script; Russian,
Ukranian, Turkmen, Moldovian are all in the Cyrillic script; etc.), it
doesn't imply regional preferences (they get handled seperately), it only
implies a colation order *between* scripts (that is, does Cyrillic come
before or after Arabic) not within a language, and a script does not
necessarily imply a character encoding (though Apple does happen to encode
each script in only one character encoding, it need not). Though there are
a couple of wierd cases, script usually implies only a particular writing
system. And with only the exception of CJK, all of the other writing
systems in Unicode have a definitive mapping into an Apple "script".

>So, I would claim that you are
>essentially asking for locale tagging, as your examples clearly indicated
>(e.g.,
>Simplified vs. Traditional Chinese); particularly since you want to use these
>tags to map to the Apple concept of "locale".

Actually, Simplified and Traditional Chinese I would put in the category of
"wierd cases" that I mentioned above. I can't give you historical
perspective on why Apple chose to seperate those two into separate scripts,
but if it came down to it, I would be willing to sacrafice the distinction
between those two if there was some argument to be made that unlike
Japanese and Korean, it *really* doesn't matter that those two are made
separate.

>The term "script" as used in the Unicode context is simply a set of
>characters,
>independent of their encoding, and independent of the language(s) which employ
>these characters in their written representation(s).

I think that most of the time, this is Apple's use as well. I think there
is just some historical baggage.

>Before we (either the UTC or IETF) runs off and standardizes a mechanism for
>language tagging, I suggest we spend some time seriously evaluating the need
>to distinguish language and locale tags and whether any proposed mechanism
>should
>provide adequate coverage for both of these requirements.
>
>I know that you (and others) may overload language tags with locale tag
>semantics.
>But I my initial thoughts on this matter is that this would be undesirable and
>have unanticipated side-effects in the long term usage of this mechanism.

Actually, I think I am pretty much in agreement with you on this point. I
don't think that we should overload language tags to serve the purpose of
distinguishing CJK (call it "locale tagging" or whatever else you want).
That's kind of why I'm pushing for this to be separate from language
tagging, which I think has other issues that need to be discussed (whether
it should out-of-band; whether the 639/3166 tags are the right ones to use,
etc.). I'm also not convinced that all of the people pushing for language
tagging really need it; I've got an inkling that most of their problems
could be solved by distinguishing CJK and that language tagging would have
a more limited use and worries about the weight of the mechanism will be
less important.

pr

--
Pete Resnick <mailto:presnick@qualcomm.com>
QUALCOMM Incorporated
Work: (217)337-6377 / Fax: (217)337-1980



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT