Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)

From: J%ORG KNAPPEN (KNAPPEN@ALPHA.NTP.SPRINGER.DE)
Date: Tue Feb 20 2001 - 07:02:06 EST


Doug Ewell wrote:

> A few days ago I said there was a "widespread belief" that Unicode is a
> 16-bit-only character set that ends at U+FFFF. A corollary is that the
> supplementary characters ranging from U+10000 to U+10FFFF are either
> little-known or perceived to belong to ISO/IEC 10646 only, not to Unicode.

This still echoes the marketing hype of Unicode 1.0 (which was before the
merger with ISO 10646).

> At least one list member questioned whether this belief was really widespread.

Since there was much noise about Unicode 1.0, this belief is implemented
widely. Only the technical experts who keep with the updates know better.

> "A 16-bit character encoding standard developed by the Unicode Consortium
> between 1988 and 1991. By using two bytes to represent each character,
> Unicode enables almost all of the written languages of the world to be
> represented using a single character set. By contrast, 8-bit ASCII is not
> capable of representing all of the combinations of letters and diacritical
> marks that are used just with the Roman alphabet.

A little out of date, but describing correctly the state of art in 1991
before the merger. Even 8-bit ASCII is a correct term meaning ISO-8859-1.
A nit to pick: It's the latin alphabet, not roman. Roman is a kind of typeface,
contrasting to sans serif aka grotesque.
 
> "Approximately 39,000 of the 65,536 possible Unicode character codes have
> been assigned to date, 21,000 of them being used for Chinese ideographs. The
> remaining combinations are open for expansion.

Also true (no Hangull syllables at that time).

> "See also ASCII."

> Exercise for the reader: See how many misstatements about Unicode (and
> ASCII) you can find in this text.

Fewer than you expect. Only the target described does not exist any longer.
Since the merger with ISO 10646 was forseeable even at that time, there are
no implementation of Unicode 1.0 anyway.

--J"org Knappen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT