Re: [Proposal] Extended UTF-16 by using Plane 14

From: Geoffrey Waigh (anzu@home.com)
Date: Mon Apr 12 1999 - 00:11:45 EDT


Masahiko Maedera wrote:
>
> Now I have a problem that
> the area 0x00100000-0x7FFFFFFF of UCS-4 can not be mapped by UTF-16.

UCS-4 is strictly an ISO-10646 construct and thus there is no problem
with Unicode not encompassing all of it.

> I think that this area may not be used right now.
> But if this area will be used in future,
> we will have serious problems of conversion and compatibility.

There are no proposals to encode anything in the upper reaches of
UCS-4 at this time. It will be quite some time before any proposals
that utilize areas outside the BMP receive final approval. Until we
start to put a dent into the UTF-16 covered regions, it seems
premature to worry about how UTF-16 based platforms will deal with
the issue.

> Especially, in ISO-10646-1, we can use Praivate Use Area
> (0x0E000000-0x00FFFFFF, 0x60000000-0x7FFFFFFF),
> And there is no prohibition to use this area now.

However planes 15 and 16 which are designated for Private Use
are accessible from UTF-16. Reviewing my copy of Unicode 2.0,
it points out that the private use planes you mention are
strongly discouraged from being used since (as you noticed,)
they are not representable in UTF-16. Anyone who has a
legitimate need for more than 131 000 private code points
that are not worth having integrated into the standard is
probably going to have serious problems finding other
applications that want to process their data.

However for those people, I suggest using UCS-4 because
the point of using UTF-16 over UTF-8 is for processing
simplicity.

Geoffrey Waigh



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT