Re: [Proposal] Extended UTF-16 by using

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Apr 14 1999 - 21:14:13 EDT


Maedera san asserted:

> But no one can perfectly deny the probablity that
> Unicode Standard define 1114113th character (over UTF-16).

Well, I can perfectly deny that probability. In addition
to John Jenkins' valid argument about the unlikelihood that
there are that many characters that anyone would want to
encode, there is also the issue of TIME TO ENCODE.

Let's put some historical perspective on this. The current
draft of Unicode 3.0 (due out fairly soon) has 47200 encoded
graphic characters. The process of getting to the number
47200 can generously be estimated to have taken more than
10 years. (The Unicode effort got down to serious repertoire
collection in 1989 for most scripts, but the Unified Han
repertoire was in the works even before that.) What that
means is that with the concerted effort of a very determined
industrial consortium and a very active and surprisingly
quick-acting international standards body (ISO/IEC JTC1/SC2/WG2),
and with everyone pressing very hard and working as fast as
they could, we have been encoding, on average, 4720 characters
per year for the last decade for the UCS.

Now, there are *actually* 974,529 code points available for
encoding via UTF-16 ( BMP: 65536 - 33 controls, - 6400 private
use, - 2048 surrogates, - 2 not a character; other planes:
14 x 65534).

*If* we assume the current high level of activity will continue
unabated indefinitely (and while there will be a lull after
Unicode 3.0 and 10646-1 2nd edition are published, there are
indeed many thousands more Han characters in the IRG hopper
at the moment), it will take the combined efforts of the
Unicode Consortium and WG2 roughly:

         974,529 / 4720 = 198 years

to finish the task of eliminating all available code points
for UTF-16. And in my opinion, that is a very serious underestimate
of the time, since the pressure right now is to *slow* the
process down drastically, so that people don't have to keep
up with the horrendous pace of dealing with 4720 new
characters every year.
 
>
> If there is not UTF-16 encording,
> all persons must throw away
> their favourite softwares based on UCS-2 right away now.
> if this would happen, many users might make a complaint
> to software vendors, not to Unicode Standard.

If all people haven't thrown away their favourite era-1999
software based on UCS-2 by the year 2197, I will eat my
shorts. And you are cordially invited to my place to watch me do it.

--Ken Whistler



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT