L2/01-262

From: Markus Scherer [markus.scherer@jtcsv.com]
Sent: Wednesday, June 20, 2001 3:07 PM

Subject: UTC Agenda Item: 
Proposal to reserve d7c0..d7ff for internal use


Mark Davis wrote:
> Markus Scherer noticed that one could apply Formula 1 to certain BMP points,

Thanks, Mark. I would like to expand on this and propose to reserve 
U+d7c0..U+d7ff permanently for internal use, in case such UTF-16 variants may be useful for someone.

To simplify your example and achieve both
- code point order=code unit order, and
- unambiguous encoding of all code points,

one could just encode all code points U+d7f5..U+10ffff as described, with U+d7f5 
as the first "lead" surrogate.
In fact, the lower limit of this could be anywhere from U+0000 to U+d7f5, which 
would require up to 64 additional "lead surrogates" compared to UTF-16.

It is not necessary to encode U+d7f8..U+d7ff with single code units.

(Your table had U+e000..U+ffff without the surrogate pair encoding, which 
yielded unambiguous encoding but not code point order.)

Clarification:
I am not proposing such an encoding form "bastard" - not even an exact 
specification or name.
I am only proposing to set aside certain 64 code points for internal use.

markus