Re: Making use of UTF-16 area for CJK

From: Martin J Duerst (mduerst@ifi.unizh.ch)
Date: Wed Aug 14 1996 - 07:40:56 EDT


Jake Morisson wrote:

>There has been a lot of argument about the merits of
>Han unification. I feel that the most important thing
>at this point is that Unicode/ISO 10646 be usable for
>the majority of people and a valid choice for
>implementation on a national basis in Asian countries.
>With Han unification, we have recorded the most common
>CJK characters and encoded them in the BMP where simple
>UCS-2 software can access it. Now we have to handle the rest.
>
>Most of the remaining CJK characters can be placed in one
>of three categories: names, national/local variants
>(e.g., Vietnamese or Cantonese characters) and rare/archaic
>characters interesting only to scholars.
>
>I think the best solution is to allocate parts of the
>UTF-16 area in blocks to the standards organizations in the
>individual countries and to scholarly groups. For example,
>Taiwan's CNS 11643 currently holds more than 50,000
>characters, with more on the way. Simply give them a block
>big enough to hold these characters (excluding those already
>encoded in the BMP).

Please not. There is a tremendous advantage of having a single
authority being responsible for allocations: You get everything
out of a single source, and in a language that most computer
experts understand, and the whole thing is more or less constent.
"Give everybody a block" was the basic idea of the first ISO 10646
DIS, which fortunately got voted down. Let's stick with a truely
international standard, and not fall back to multinational.

>This will make Unicode immediately usable for any given
>country. Unicode can be chosen with the confidence that any
>local character worthy of recording in the national set will
>be included. Local needs will be satisfied--once a block is
>allocated for HK, there will be no complaining by Mandarin
>speakers that a Cantonese character should not be included.

I don't know of any case where such complaints were raised,
or even more, accepted.

>Giving each national group their own block of characters might
>just be enough of an incentive that they would be willing to
>go outside of the BMP. Resulting software support for UTF-16
>or UCS-4 would open the door for real multilingual computing
>with a common character set.

Where you definitely would have to write "common" in quotes,
because the commonality would not be more than a word.

Regards, Martin.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT