I'm not sure if Ken's original message made it to the list (he had the list
address wrong in the Cc line), so I'm going to include all of his note in
On 6/4/97 at 6:07 PM -0500, Kenneth Whistler wrote:
>> If the text is supposed to be Japanese, and I
>> have both the Chinese and Japanese Language Kits installed, it will likely
>> be important to the user that I use a Japanese font instead of a Chinese
>> font when I display it on the screen. Without language tagging information,
>> I am unable to do that. It will be unacceptable to the user if I use the
>> incorrect font when the information comes from the ACAP server
>This is a perfectly valid situation where "language" tagging
>is required. I don't think anyone would contest that, other
>than to point out that it isn't exactly language which is
>getting tagged. (This is the old bugaboo of Asian IT
>implementations: "Japanese" is a language with a single
>overall appropriate character style, with a single character standard
>(JIS) encoded in many variants and extensions; "Chinese" is several
>languages, spanning several countries, using two overall
>appropriate character styles ["Traditional" and "Simplified"],
>with several character standards [GB2312, Big5, CNS,...]
>encoded in many variants and extensions.) What we are really
>tagging is something like user-presentation-preference-for-Han-glyphs,
>which only roughly correlates with the language identity.
Right. In fact, in an implementation I'm working on, I don't actually
"language" tag, but instead "Macintosh script" tag, one tag for Japanese,
one for Korean, one for Simplified Chinese, and one for Traditional Chinese.
>O.k., so brush aside all the CJK issues per se, and let us
>simply state that your application needs to tag "something" in
>plain text records, something that we will agree to call "language"
>in lieu of a better word to communicate what it is.
>Then the issue is just how to design a workable tagging scheme
>in a hurry that meets the Crispin criteria. The basic objection
>I am hearing from the Unicode side is that it is a very bad
>choice to mung up the bytes of UTF-8 to do this. I would
>contend that an RFC that does equivalent tagging to what MLSF
>specifies, but which does it using a declared range of user-
>defined characters, will accomplish exactly what you need to
>do for "language" tagging without any layering on top of
>UTF-8 at all.
I agree completely. Especially if the Unicode consortium would come up with
a list of reserved characters that were specifically for tagging, I think
the problem would be solved. The more we all have to come up with "private"
unsanctioned solutions, the further we will get from interoperability.
Personally, in my instance, I have used 5 of the corporate zone characters
for my 4 tags I mentioned above, plus one for "none specified", and that
works for me. It might work for ACAP too, but it would be nice if the
consortium would decide on a way to do this.
-- Pete Resnick <mailto:email@example.com> QUALCOMM Incorporated Work: (217)337-6377 / Fax: (217)337-1980
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT