Re: CJK tags - Fish or cut bait

From: Pete Resnick (presnick@qualcomm.com)
Date: Sun Jun 22 1997 - 16:36:04 EDT


On 6/21/97 at 10:39 PM -0500, John Jenkins wrote:

>On 6/21/97 10:01 AM Pete Resnick (presnick@qualcomm.com) wrote:
>
>>I don't necessarily see how any OS (Macintosh or otherwise) could possibly
>>deal with 16-bit Unicode as editable plain text without some sort of CJK
>>distinction. We all have to deal with input methods at some point.
>
>Yes, but as I see it, the issue is whether the user wants to be switching
>input methods back and forth all the time (or will even have more than
>one installed).

John, you guys sell these language kits, and allow more than one of them to
be installed at once. I am not in the habit of writing non-deterministic
code. I *must* choose which Macintosh script system to use when I'm
converting characters in the CJK range. My code can't crash, or even do
something dumb and non-deterministic if the user happens to have more than
one language kit installed. Furthermore, when the user types something in
one of these scripts and I save it to disk in Unicode (as I plan to do), it
better come back up from the disk and display and be editable to the user
in the same way it was when they typed it.

This is not an issue of the user "switching input methods back and forth
all the time"; it's a matter of choosing one from the current list of
installed choices.

>I'm gonna have to give the whole issue some more thought.

I will contact you offline to discuss this more.

>Actually, the original proposal came from Kobayashi-san at Justsystem,
>not from me. (It was just funnelled through me.)
>
>The proposal was to add four source disambiguation characters for Unihan.
> This is, you will note, slightly different from the language tagging
>that Pete's been talking about.

Please don't attribute the language tagging proposal to me. Though language
tagging is a fine thing (more likely out-of-band than in-band), I am not
pushing for that. All I want is CJK disambiguation.

>>I've only got one big problem with the Plane 14 proposal: the Plane. The
>>Unicode implementation on the Macintosh (and I'm guessing on lots of other
>>platforms) only does good old-fashioned 16-bit Unicode. A tagging scheme
>>that requires 32-bit characters is, for all intents and purposes, useless
>>to me.
>
>Actually, that isn't true. The Mac can handle surrogate-laden UTF-16 as
>easily as surrogate-free. Both our main technologies for Unicode support
>at the moment -- QuickDraw GX and the TEC -- have no problems with
>surrogates, and the Unicode drawing API for QuickDraw that we're working
>on will have surrogate support designed in from the start.
>
>Are you perhaps confusing surrogates and UCS-4? You don't need UCS-4 to
>handle Plane 14. It's in the surrogate space.

I think you're right; I am confusing these. Again, I'll communicate with
you offline about this, unless you think it would be of public benefit to
explain the difference between things in surrogate space and UCS-4.

>>I urge the UTC to adopt some set of characters for CJK distinction within
>>16-bit Unicode.
>
>I'll talk to Peter about adding the four script-disambiguation tags to
>Apple's user zone implementation.

As soon as you have these available, please let me know.

pr

--
Pete Resnick <mailto:presnick@qualcomm.com>
QUALCOMM Incorporated
Work: (217)337-6377 / Fax: (217)337-1980



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT