Re: Displaying Plane 1 characters (annotating the code table

From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Nov 09 1998 - 15:37:17 EST


Markus Scherer noted:

> However, it probably makes sense for files as an easy and somewhat compact
> format, and it makes sense for the number of possible characters: 1M + 64k,
> including 128k+6400 private use character code points. There are about 38000
> characters assigned so far, with about 20000-30000 more in the pipeline.

Here are the exact values of what currently is encoded and what Unicode 3.0
will contain (synched with the prospective content of the republication
of ISO/IEC 10646-1):

Unicode 2.1:

 6813 Misc. characters
20902 Unihan
11172 Johab Hangul
 6400 Private use
 2048 Surrogates
   65 Controls
    2 Not characters
18134 Unassigned assignable

38887 Assigned graphic characters

Unicode 3.0 (prospective, as of November 3, 1998):

10554 Misc. characters
20902 Unihan
 6582 Unihan Extension A
11172 Johab Hangul
 6400 Private use
 2048 Surrogates
   65 Controls
    2 Not characters
 7811 Unassigned assignable

49210 Assigned graphic characters

For a net gain of 10323 new characters.

Others have noted the following, but I would like to reiterate, so that
*correct* rumors can circulate, instead of incorrect ones:

Unicode 3.0 will *not* contain any encoded characters requiring surrogates.
The republication of ISO/IEC 10646-1 will *not* contain any encoded
   characters outside of the Basic Multilingual Plane.

Plane 1 (and 2 and 14) are for ISO/IEC 10646-2, which is still in
working draft and which has not yet even started a CD ballot. When 10646
Part 2 progresses far enough, we anticipate publishing a Version 4.0 of
the Unicode Standard -- and *that* will make use of surrogate codes
to access encoded characters on Planes 1 and beyond.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT