Re: explicit 20 bit Unicode range limit

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Jan 27 1999 - 13:59:15 EST


Markus,

> I still think that for hex-digit notations and other implementation details
> it would be more convenient and "natural" to not use plane 16 (and up).

This is undoubtedly true, but this has to be an implementation
decision, and not part of the standard.

For UTF-16, the Plane 16 horse is already out of the barn, so to
speak. It is already specified, in the standards (both 10646 and
the Unicode Standard) as available for private use characters.
That cannot be changed now without changing the standards in a way
that has to potential to invalidate someone's use of characters --
not what the maintainers of a standard should do. Even if not
many implementations make use of Plane 16 private use characters
as yet, trying to formalize a UTF that disallows them is tantamount
to asking to pull some characters from the Private Use Area on
the BMP and use them for something else (or disallow their use).

Unicode implementations already have the option of specifying what
characters they will and will not interpret. Those that ignore
surrogates, and interpret only Plane 0 can happily store everything
in 16 bits and be done with it. Those that use surrogates might
choose not to interpret Plane 16 private use characters, in which
case they would be free to pack and store everything in 20 bits,
as per your proposal, should they wish. But there is no real need
to formalize such a scheme *as a new, standard UTF*. For interchange
of data, either of the existing UTF's (UTF-16, UTF-8) are sufficient
as public, standard mechanisms; furthermore, cooperating processes
can have a private agreement (which might in fact even be a
standardized protocol) to use any other interchange format they
choose.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT