Re: Displaying Plane 1 characters

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sun Nov 08 1998 - 14:32:19 EST


ISO 10646-2 will cover ALL planes other than plane 0.

I suggest that you instead use a designation

        *-iso10646-1 UCS Plane +00 (Basic Multilingual Plane)
        *-iso10646-2 UCS Plane +01 (Etruscan, Music, etc.)
        *-iso10646-2 UCS Plane +02
        *-iso10646-2 UCS Plane +03
        ...

Note theat there are no leading 0's in the 'part' field for an ISO standard.

>Java is also going to get problems: "\u10208" would be mistaken as
>U+1020 <undefined Mongolian character> U+0038 DIGIT EIGHT instead
>of U-00010208 ETRUSCAN LETTER TH.

\uD800\uDE08 is an obvious answer for Java, since Java's 16-bit data
type implies its use of UTF-16.

>How is Unicode 3.0 going to deal with the extraplanar characters?
>Will they be sorted in as UTF-16 surrogate pairs like
>
>D800 DE08:ETRUSCAN LETTER TH
>
>into the character database or will all character numbers be
>null-expanded from four to eight UCS-4 hexdigits?
>
>00010208:ETRUSCAN LETTER TH

We are currently working on rationalizing many aspects of our database.
This is one of the aspects that we need to look into. In working with
propoerty lists etc. using UTF-16 is possible, but not the most convenient.
In terms of organizing character codes, the concept of planes has its use
as well - perhaps on the same level a book can be organized into volumes
or sections. This is a matter of how to present the information.

No matter what we come up with, it will not change the underlying definition
of UTF-16, UTF-8 etc. All the existing technical reasons for pickign a
particular transformation format for data interchange will remain valid - no
matter how we arrange our table data for pbulication.

>
>The (U-)000 prefix will be redundant if all future definitions stay
>within the 20-bit range adressable with UTF-16 and the leading zeroes
>are awkward to type. Isn't there a shorter notation like U+G208,
><U10208>, U=010208, or U*10208?
>
>I have seen people using a U+12345 notation even though ISO 10646 only
>allows either U+1234 or U-12345678.

therefore U+12345 is not a conformant ot ISO 10646. It's really important
to know how many hex digits follow. U+1234 DEAD and U-1234 DEAD are two very
different things! In some ways you can say that the initial 00 after the U-
can act as a nice reduncy check. However, Plane 16 certainly has U-001!!

A./



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT