Re: [Proposal] Extended UTF-16 by using Plane 14

From: Geoffrey Waigh (anzu@home.com)
Date: Tue Apr 13 1999 - 04:12:24 EDT


Masahiko Maedera wrote:
>
> In Japan and Taiwan,
> some groups already collect about 100,000 Han characters.

Is this 100 000 not including the 20 000+ already encoded in Unicode 2.1?
Nor the (I believe 6 500+) the IRG has laboured long to have added in the
near future? Have these 100 000 undergone the extensive unification
efforts that the other CJK characters in the standard have been subjected
to? Is there no way for these characters to be encoded more efficiently?

> For example,
> they use Perl based on UTF-8 and some text editor base on UTF-16.
> They have anxiety that they collect over 128K Han characters in future,
> because they don't have enough skill to construct UCS-4 libraries.
> I am anxious about them.

I would suggest then they get a text editor that works solely with UTF-8
that supports the entire UCS-4 range. I'm baffled as to how they can
secure the resources to analyze and work with 100 000 distinct characters
(and presumably have a means of correctly identifying which of those
100 000 any particular sample they come across corresponds to - an
impressive feat in itself given that the source documents are in all
likelihood non-digital,) but cannot get a programmer or contract an
existing vendor to supply the relevant utilities in non-UTF-16 form.
Given how much trouble people on the list express in trying to find
adequate Unicode tools from non-specialized software houses, I'm amazed
that an off-the-shelf UTF-16 surrogate capable system was found.

I'm quite willing to deal with numerous complexities when implementing
Unicode (like composing character sequences,) which are clearly necessary
to efficiently implement various people's scripts. I'm extremely uneager
to saddle every implementation of UTF-16 on the planet with an extra
escape mechanism because someone with extreme personal (remember this is
Private Use only,) needs couldn't readily get software based on UCS-4 or
UTF-8.

Geoffrey Waigh



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT