Re: Displaying Plane 1 characters

From: Asmus Freytag (
Date: Sun Nov 08 1998 - 14:32:19 EST

ISO 10646-2 will cover ALL planes other than plane 0.

I suggest that you instead use a designation

        *-iso10646-1 UCS Plane +00 (Basic Multilingual Plane)
        *-iso10646-2 UCS Plane +01 (Etruscan, Music, etc.)
        *-iso10646-2 UCS Plane +02
        *-iso10646-2 UCS Plane +03

Note theat there are no leading 0's in the 'part' field for an ISO standard.

>Java is also going to get problems: "\u10208" would be mistaken as
>U+1020 <undefined Mongolian character> U+0038 DIGIT EIGHT instead
>of U-00010208 ETRUSCAN LETTER TH.

\uD800\uDE08 is an obvious answer for Java, since Java's 16-bit data
type implies its use of UTF-16.

>How is Unicode 3.0 going to deal with the extraplanar characters?
>Will they be sorted in as UTF-16 surrogate pairs like
>into the character database or will all character numbers be
>null-expanded from four to eight UCS-4 hexdigits?

We are currently working on rationalizing many aspects of our database.
This is one of the aspects that we need to look into. In working with
propoerty lists etc. using UTF-16 is possible, but not the most convenient.
In terms of organizing character codes, the concept of planes has its use
as well - perhaps on the same level a book can be organized into volumes
or sections. This is a matter of how to present the information.

No matter what we come up with, it will not change the underlying definition
of UTF-16, UTF-8 etc. All the existing technical reasons for pickign a
particular transformation format for data interchange will remain valid - no
matter how we arrange our table data for pbulication.

>The (U-)000 prefix will be redundant if all future definitions stay
>within the 20-bit range adressable with UTF-16 and the leading zeroes
>are awkward to type. Isn't there a shorter notation like U+G208,
><U10208>, U=010208, or U*10208?
>I have seen people using a U+12345 notation even though ISO 10646 only
>allows either U+1234 or U-12345678.

therefore U+12345 is not a conformant ot ISO 10646. It's really important
to know how many hex digits follow. U+1234 DEAD and U-1234 DEAD are two very
different things! In some ways you can say that the initial 00 after the U-
can act as a nice reduncy check. However, Plane 16 certainly has U-001!!


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT