From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 31 2004 - 18:16:03 EST
> Surely Unicode didn't waste two planes for something that
> no one can practically use.
Plane 15 and Plane 16 private use characters weren't the
invention of the UTC, by the way. They derive from the
original specification of ISO/IEC 10646-1. From
ISO/IEC 10646-1: 1993:
"The code positions of 32 planes from Plane E0 to Plane FF
of Group 00 shall be for Private Use.
"The code positions of the 32 groups from Group 60 to Group 7F
shall be for Private Use."
That would have been:
U-00E00000..U-00FFFFFD
U-60000000..U-7FFFFFFD
That was 8224 *planes* of private use code positions.
Amendment 1 (the one that defined UTF-16) amended that to
read:
"The code positions of the 32 groups from Group 60 to
Group 7F shall be for private use.
"The code positions of Plane 0F and Plane 10, and of the
32 planes from Plane E0 to Plane FF, of Group 00 shall
be for private use.
"The 6400 code positions E000 to F8FF of the Basic
Multilingual Plane shall be for private use."
That was 8226 *planes* of private use code positions,
besides the 6400 code positions on the BMP (which had
been defined earlier, but not spelled out in the same
clause with the rest of the private use allocation).
The addition of Plane 0F and Plane 10 was so there were
some private use planes accessible via UTF-16.
In that grand proliferation of "wastage", 10646 allowed for
539,089,084 private use code positions. That was a wee
tad more than anyone actually needed to use, by the way.
More recent amendments to 10646 have simply settled on
the principle that *all* code positions beyond U-0010FFFF
are reserved, leaving the 6400 private use code positions
on the BMP, plus Plane 0F and Plane 10. In the grand scheme
of things, that seems to be the Goldilocks solution -- not
too small (6400) and not too big (539,089,084) -- but juuuust
right (137,468).
There are people who have valid reasons for making use
of Plane 0F or Plane 10 private use characters, by the
way, but most of those reasons have to do with CJK. And
the reason for that should be pretty obvious -- only the
CJK script deals with the kind of entity numbers (multiple
10's of thousands) that make the 6400 code points of
the BMP PUA seem cramped. *Any* other unencoded script,
for example, with the possible exceptions of Egyptian
hieroglyphics or Tangut ideographs, would fit into the
BMP PUA with plenty of room to spare.
--Ken
This archive was generated by hypermail 2.1.5 : Wed Mar 31 2004 - 18:54:19 EST