Re: plane business

From: Asmus Freytag (
Date: Mon Oct 01 2001 - 22:42:46 EDT

There are 66 non-characters as of Unicode 3.1, there were 34 non-characters
There are no "hidden" non-characters, but there were 'hidden' planes in
Unicode 3.0
- hidden in the limited sense that they were defined as character and
locations, but no characters were assigned, other than the private use

Since nothing of interest was going on in these upper reaches, Unicode 3.0
did very
little (some might say, too little) in detailing the issues. Unicode 3.1
was unusual
in that it introduced over 40,000 new characters. By rights, it should have
been a
x.0 version, but we like to reserve these even numbers for when we can get
a book
out (4.0 is still way over a year away). Since 3.1 did not re-issue the
book, it
had to be limited in the text that could be written to reflect the sudden
of the supplementary planes. This is all being rectified for 4.0, but it takes
time and in the meantime there is the frustration of having to piece things
from a UAX and the book, with the latter containing some passages that will
need to
be revised, but were not superseded by replacement text in UAX#27, to keep
the latter
at least readable.

> > BTW, it doesn't make sense for every code position
> > ending in FFFF or FFFE to be a non character.
>It doesn't make much sense, but it is the rule anyway.

This crept in during the merger of ISO/IEC 10646 and Unicode, and when it was
discovered, it was too late to do anything about it.

> > Why isn't the same rule applied to the "hidden" non
> > characters, so that every code value ending in FDD0 to
> > FDEF is also a non character? Is it to contribute to
> > their "hidden" nature?
>No. There is simply no reason to reserve them on the other planes.

The reason to put the additional (defined in 3.1) non-characters into the
BMP is to allow them to have single codes for UTF-16 implementation -
something that doesn't
work so well if they are on the higher planes.

They are stuck out in the middle of the Arabic Presentation Forms block, as
the only
way we knew how to reclaim that code space, since we have long been agreed
to not encode any more Arabic presentation forms. Since non-characters
really are 'reserved in perpetuity' that matched the de-facto status of
that code area.


This archive was generated by hypermail 2.1.2 : Mon Oct 01 2001 - 21:24:16 EDT