Re: CJK tags - Fish or cut bait

From: jenkins (jenkins@apple.com)
Date: Sat Jun 21 1997 - 23:40:45 EDT


On 6/21/97 10:01 AM Pete Resnick (presnick@qualcomm.com) wrote:

>On 6/20/97 at 6:37 PM -0500, Kenneth Whistler wrote:
>
>>
>> The architecture of the Macintosh WordScript support for CJK means
>> that I *must* use some non-standard method to use Unicode as international
>> plain text.
>
>I don't necessarily see how any OS (Macintosh or otherwise) could possibly
>deal with 16-bit Unicode as editable plain text without some sort of CJK
>distinction. We all have to deal with input methods at some point.
>

Yes, but as I see it, the issue is whether the user wants to be switching
input methods back and forth all the time (or will even have more than
one installed).

I'm gonna have to give the whole issue some more thought.

>>The right way to approach this in the Unicode Consortium (as an industry
>>consortium, after all), is for Apple to bring in a clearly stated
>>implementation requirement for these 3 or 4 (?) additional codes. John
Jenkins
>>has already proposed something of the sort in UTC, for just this problem.
>>If Apple can then convince the rest of the industry, as represented in
>>the Consortium, that the problem these corporate user-defined characters
>>are addressing is widespread and common enough (and commensurate for each
>>participant) to justify giving standard codes to them, then UTC would vote
>>for them and work with WG2 to see they got into 10646.
>
>I will talk to John about this, but it seems like the logical thing for the
>UTC to do.
>

Actually, the original proposal came from Kobayashi-san at Justsystem,
not from me. (It was just funnelled through me.)

The proposal was to add four source disambiguation characters for Unihan.
 This is, you will note, slightly different from the language tagging
that Pete's been talking about.

>
>I've only got one big problem with the Plane 14 proposal: the Plane. The
>Unicode implementation on the Macintosh (and I'm guessing on lots of other
>platforms) only does good old-fashioned 16-bit Unicode. A tagging scheme
>that requires 32-bit characters is, for all intents and purposes, useless
>to me.
>

Actually, that isn't true. The Mac can handle surrogate-laden UTF-16 as
easily as surrogate-free. Both our main technologies for Unicode support
at the moment -- QuickDraw GX and the TEC -- have no problems with
surrogates, and the Unicode drawing API for QuickDraw that we're working
on will have surrogate support designed in from the start.

Are you perhaps confusing surrogates and UCS-4? You don't need UCS-4 to
handle Plane 14. It's in the surrogate space.

>Personally, I don't think ACAP is going to pass the IETF "interoperable
>implementations" test if they require Plane 14 language tags, or any 32-bit
>10646 for that matter. I'm pretty sure the Macintosh Unicode converter (in
>its current incarnation) would spit back error messages on 32-bit
>characters at this point, and I'll bet there other Unicode implementations
>that do the same thing.

Er, actually, the Macintosh Unicode converter handles UTF-16 text without
the slightest problems, not a burp, not a hiccup.

>But that's an issue for the ACAP group and the
>IETF. I'm not saying language tags are a bad idea, nor am I saying that
>standardizing them in Plane 14 wouldn't be a fine thing. I'm just not
>convinced that ACAP (or Eudora) will be using them any time soon.
>
>I urge the UTC to adopt some set of characters for CJK distinction within
>16-bit Unicode.
>

I'll talk to Peter about adding the four script-disambiguation tags to
Apple's user zone implementation.

Any volunteers to raise this at UTC in August?

=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT