Re: [Proposal] Extended UTF-16 by using

From: John Jenkins (jenkins@apple.com)
Date: Wed Apr 14 1999 - 20:09:37 EDT


> Dear, everybody,
>
> But no one can perfectly deny the probablity that
> Unicode Standard define 1114113th character (over UTF-16).
>
> If there is not UTF-16 encording,
> all persons must throw away
> their favourite softwares based on UCS-2 right away now.
> if this would happen, many users might make a complaint
> to software vendors, not to Unicode Standard.
>
> I am anxious about similar thing will heppen,
> if there is no Extended UTF-16 mechanism.
>

Even the most generous assessments as to how many characters there are for
encoding that I've seen fall far short of requiring even as many as one
million characters. For example, the *largest* estimates for the number of
Han ideographs -- characters, mind, not glyphs -- is less than 120,000.
Catalogs of scripts for encoding are fairly complete and lists of the
possible characters needed for them are easy to come up with. There is
absolutely *zero* evidence that there may be a need for this many code
points.

But suppose it happens. Suppose we *do* need more than a million code
points. We've faced a similar situation before. Unicode had originally
determined that 65,534 code points would be more than enough to do what
needed to be done -- but that estimate proved to be too small. At the point
where it was clear this was the case, the surrogate mechanism was developed
-- long before we got to the point where it was actually needed.

IOW there is no reason to solve this problem right now. There is no
evidence that it *is* a problem. If it ever *does* become a real problem,
we'll know it sufficiently far in advance that we can put a solution in
place and have it adopted before it really hits us.

But let's not solve it now based on a vague fear that it might eventually
happen.

=====
John H. Jenkins
jenkins@apple.com
tseng@blueneptune.com
http://www.blueneptune.com/~tseng



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT