Re: [Proposal] Extended UTF-16 by using Plane 14

From: Geoffrey Waigh (
Date: Tue Apr 13 1999 - 15:02:48 EDT

Christian Wittern wrote:
> Yes, it includes all these characters. Nevertheless, these groups find it
> convenient to build their own pool of data. That is exactly why the use of
> the private user area is intended. Of course, in the long run, some more of
> these data might end up in proposals for inclusion in future extension of
> ISO10646/Unicode.
> The characters in question have deliberately not undergone the unification
> in question, since the preservation of the exact glyph shape is deemed of
> interest. This again is a reason to use the private character area, and
> again, it is a reason the number of characters needed might possibly exceed
> 131000.
> > but cannot get a programmer or contract an
> > existing vendor to supply the relevant utilities in non-UTF-16 form.
> > Given how much trouble people on the list express in trying to find
> > adequate Unicode tools from non-specialized software houses, I'm amazed
> > that an off-the-shelf UTF-16 surrogate capable system was found.
> The reason for this is very simple: This is not a big international company,
> but rather some grass root projects, shareware and freeware programers
> Also, if you read Mr. Maederas message again, you will find that he is
> programming such an editor and that is why he came up with his proposal. He
> wants to stay compatible with the way other people are handling this and at
> the same time accomodate the needs of this scattered groups.

Ah, then they have a programmer who can write the editor to use UTF-8 as
fits their other tools and coexists with existing standards - and thus
can be compatible with existing implementations rather than an extended
UTF-16 which *cannot* have any implementations and *will not* be deployed
extensively for some time to come.

> And I think, you should at least read the message and understand the problem
> before coming out with such a ridiculus judgement.

I understand the problem quite well. They want to adapt Unicode to encode
a large swath of non-Unicode type data and are now asking the rest of the
world to modify their software so that this non-Unicode data will be
processed in some fashion correctly. There are already 2 encodings for
ISO-10646 which will allow them to store huge quantities of non-Unicode
data compatibly. Given the implication that they would not be using the
existing CJK in the BMP, I would think that both UTF-8 and UCS-4 are more
space and processing efficient than a stream of what would be mostly
12 octet sequences in their data. UTF-16 was designed for something
different from what they are trying to accomplish.

Geoffrey Waigh

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT