RE: Perception that Unicode is 16-bit (was: Re: Surrogate space i n Unicode)

From: Marco Cimarosti (marco.cimarosti@essetre.it)
Date: Tue Feb 20 2001 - 04:58:56 EST


Doug Ewell wrote:
> "A 16-bit character encoding standard [...]
> By contrast, 8-bit ASCII [...]

These two statements are regularly found together, but it is the second one
that makes me despair.

If nearly half a century was not enough time for people to learn that ASCII
is a 7-bit encoding, how long will it take to fix the Unicode misconception?

Moreover, the analogy between the two statements above is illusory, the
Unicode misconception being much bigger than the ASCII one.

In fact, it *does* make sense to say that "ASCII is an n-bit encoding". The
only problem is that the correct value for n is 7, not 8.

But in the case of Unicode it is not possible to change "16" with the
correct number, because there is no correct number!

When I tried fighting the 16-bit misconception, I found myself involved in a
long explanation (versions, surrogates, UTF's, how many Chinese
ideographs...), at the end of which my interlocutors normally ask: "So, how
many bits does it have?"

How about considering UTF-32 as the default Unicode form, in order to be
able to provide a short answer of this kind:

        "Unicode is now a 32-bit character encoding standard, although only
about one million of codes actually exist, and there are ways of
representing Unicode characters as sequences of 8-bit bytes or 16-bit
words."

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT