RE: [Proposal] Extended UTF-16 by using Plane 14

From: Christian Wittern (chris@ccbs.ntu.edu.tw)
Date: Tue Apr 13 1999 - 04:57:45 EDT


Geoffrey Waigh wrote:

> Masahiko Maedera wrote:
> >
> > In Japan and Taiwan,
> > some groups already collect about 100,000 Han characters.
>
> Is this 100 000 not including the 20 000+ already encoded in Unicode 2.1?
> Nor the (I believe 6 500+) the IRG has laboured long to have added in the
> near future? Have these 100 000 undergone the extensive unification
> efforts that the other CJK characters in the standard have been subjected
> to? Is there no way for these characters to be encoded more efficiently?

Yes, it includes all these characters. Nevertheless, these groups find it
convenient to build their own pool of data. That is exactly why the use of
the private user area is intended. Of course, in the long run, some more of
these data might end up in proposals for inclusion in future extension of
ISO10646/Unicode.
The characters in question have deliberately not undergone the unification
in question, since the preservation of the exact glyph shape is deemed of
interest. This again is a reason to use the private character area, and
again, it is a reason the number of characters needed might possibly exceed
131000.

>
> > For example,
> > they use Perl based on UTF-8 and some text editor base on UTF-16.
> > They have anxiety that they collect over 128K Han characters
> in future,
> > because they don't have enough skill to construct UCS-4 libraries.
> > I am anxious about them.
>
> I would suggest then they get a text editor that works solely with UTF-8
> that supports the entire UCS-4 range. I'm baffled as to how they can
> secure the resources to analyze and work with 100 000 distinct characters
> (and presumably have a means of correctly identifying which of those
> 100 000 any particular sample they come across corresponds to - an
> impressive feat in itself given that the source documents are in all
> likelihood non-digital,)

It's amazing what cooperative internet projects can do, isnt'it :-)

> but cannot get a programmer or contract an
> existing vendor to supply the relevant utilities in non-UTF-16 form.
> Given how much trouble people on the list express in trying to find
> adequate Unicode tools from non-specialized software houses, I'm amazed
> that an off-the-shelf UTF-16 surrogate capable system was found.

The reason for this is very simple: This is not a big international company,
but rather some grass root projects, shareware and freeware programers
Also, if you read Mr. Maederas message again, you will find that he is
programming such an editor and that is why he came up with his proposal. He
wants to stay compatible with the way other people are handling this and at
the same time accomodate the needs of this scattered groups.

>
> I'm quite willing to deal with numerous complexities when implementing
> Unicode (like composing character sequences,) which are clearly necessary
> to efficiently implement various people's scripts. I'm extremely uneager
> to saddle every implementation of UTF-16 on the planet with an extra
> escape mechanism because someone with extreme personal (remember this is
> Private Use only,) needs couldn't readily get software based on UCS-4 or
> UTF-8.

And I think, you should at least read the message and understand the problem
before coming out with such a ridiculus judgement.

Christian Wittern

Dr. Christian Wittern
Chung-Hwa Institute of Buddhist Studies
276, Kuang Ming Road, Peitou 112, Taipei, TAIWAN



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT