Re: Rationale for U+10FFFF?

From: Doug Ewell (dewell@compuserve.com)
Date: Sun Mar 05 2000 - 21:59:22 EST


Harald Tveit Alvestrand <Harald@Alvestrand.no> wrote:

> the current trend in UNICODE/ISO 10646 seems to be to limit the number
> of planes to 17 (U+0 to U+10FFFF).
> Can someone tell me the rationale for not deprecating plane 16, and
> leave us with the much more rational U+0 to U+FFFFF?
> Or even allocating the whole bit, and making it U+0 to U+1FFFFF?

The "obvious" answer is that U-0010FFFF is the highest code point
accessible through the UTF-16 retrofit. You couldn't allocate through
U-001FFFFF (plane 31) because everything from plane 18 on could not be
reached with UTF-16.

*Deprecating* plane 16... hmm, that seems logical, doesn't it? Unicode
has *almost* done that by declaring it a private-use plane. I wonder if
you still have to support plane 16 (the 21st bit) to be fully Unicode-
compliant (a term I know is loaded and has no one simple definition).
Perhaps your program would not be considered fully compliant, but who
would know? Only those who have defined private-use characters in plane
16, and those who are testing specifically for this case.

> To me, this seems on a par with the ISO session layer that mandated
> sequence number ranges of 0 to 99999 (0x1869F - a 17-bit number) -
> something that will cause readers for tens of years to come to shake
> their heads and say "these guys didn't know what they were doing";
> checks for legality now need a range compare, not just an AND
> operation. Similar for UTF-8 encoders/decoders; this extra plane will
> haunt implementations for years to come.

I agree; it certainly feels like a wart.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT