RE: Unicode, Cure-all or Kill-all?

From: Murray Sargent (
Date: Mon Aug 12 1996 - 18:48:08 EDT

The question "When can we expect to see UTF-16" has two answers, one of
which Ken gives below, namely now, since it is most definitely part of
Unicode 2.0.

The second answer would reveal when you could actually use UTF-16
comfortably. I don't know the details of this second answer, but here's
a thought about it. We need at least four interrelated things:

1) standardized code points for a sufficiently valuable set of
characters above the basic multilingual plane (BMP)
2) readily available fonts that have these characters
3) UTF-16 (or UCS-4) support in operating-system text-display functions
4) appropriate generalizations to the applications themselves

Items 2 through 4 involve a significant amount of effort on the part of
software and font vendors, so they need good business cases to justify
that effort. Item 1 is key to making those business cases.

Judging by the time it has taken to support Unicode itself without
UTF-16, the code points have to be in place for a while before the fonts
and software become available to handle them. To this end, hopefully
character standards bodies will receive good proposals for characters
above the BMP real soon. For example, simple arithmetic reveals that
the BMP cannot accomodate all the desired Han characters and therefore
that the majority of the unencoded Han characters need to be encoded
above the BMP. The sooner the choices are made, the sooner we'll see
computing environments that support them either using UTF-16 or UCS-4.
My guess is that software will favor UTF-16 over UCS-4 because it's hard
to justify switching all text to four bytes when characters outside the
BMP are used relatively rarely.

>-----Original Message-----
From: unicode@Unicode.ORG [SMTP:unicode@Unicode.ORG]
Sent: Monday, August 12, 1996 2:21 PM
To: unicode@Unicode.ORG
Subject: Re: Unicode, Cure-all or Kill-all?

>>With UTF-16, Unicode has a codespace of about 1000000 codepoints.
>>That's enough for at least the next 500 years.
> Question 1) When? 2) Is that ISO 4? 3) Is it 256 x 256 x256 x256?

When: Now. It is part of Unicode 2.0, which is an accepted standard.

Is that ISO 4?: I assume you mean UCS-4 in ISO 10646. No. It is
           UTF-16 in ISO 10646.

Is it 256 x 256 x 256 x 256? No. It is 1024 x 1024, transformable to
           UCS-4 code points at U-00010000 to U-0010FFFF.

--Ken Whistler

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT