Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in

From: DougEwell2@cs.com
Date: Tue Feb 20 2001 - 11:27:04 EST

Next message: John Hudson: "Re: Implementing Complex Unicode Scripts"
Previous message: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Next in thread: Antoine Leca: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Antoine Leca: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Cowan: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Roozbeh Pournader: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Tex Texin: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Michael Everson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Erland Sommarskog: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

In a message dated 2001-02-20 04:21:49 Pacific Standard Time,
KNAPPEN@ALPHA.NTP.SPRINGER.DE writes:

> A little out of date, but describing correctly the state of art in 1991
> before the merger.

Agreed, but the example was from Windows 2000. It should at least be current
through Unicode 2.1.

> Even 8-bit ASCII is a correct term meaning ISO-8859-1.

I would question that. Understandable, yes, but not really correct.

> A nit to pick: It's the latin alphabet, not roman. Roman is a kind of
> typeface, contrasting to sans serif aka grotesque.

True. I have also heard "roman" used to mean the opposite of italic.

> > Exercise for the reader: See how many misstatements about Unicode (and
> > ASCII) you can find in this text.
>
> Fewer than you expect. Only the target described does not exist any longer.
> Since the merger with ISO 10646 was forseeable even at that time, there are
> no implementation of Unicode 1.0 anyway.

Here is my list. Remember that I am expecting information supplied with
Windows 2000 to be current through Unicode 2.1.

> A 16-bit character encoding standard

Wrong; surrogates have existed since about 1993 (someone help me with the
exact date).

> developed by the Unicode Consortium between 1988 and 1991.

This implies that development was finished in 1991, and only new characters
are added. In fact, lots of new development to Unicode has taken place since
then (just look at all the TR's). This might be splitting hairs.

> By using two bytes to represent each character,

Even "16 bits" would be better than "two bytes" here, but again this is
nit-picking.

> Unicode enables almost all of the written languages of the world to be
> represented using a single character set.

Hey, they got something right!

> By contrast, 8-bit ASCII

Mentioned above.

> is not capable of representing all of the combinations of letters and
diacritical
> marks that are used just with the Roman alphabet.

I thought "Roman" was simply an alternate word for "Latin," but Jorg is
correct. This is also an error.

> Approximately 39,000 of the 65,536 possible Unicode character codes have
> been assigned to date, 21,000 of them being used for Chinese ideographs.

The count was correct once, but that was 10 years ago.

> The remaining combinations are open for expansion.

"Combinations"? You mean of two bytes?

Well, that's about enough. I am not a habitual Microsoft basher, but
somebody in their Help department really needs to update the information
distributed with their OS. Tex is right that we are bound to see a certain
amount of misinformation, but it is our duty to help correct it.

-Doug Ewell
Fullerton, California

Next message: John Hudson: "Re: Implementing Complex Unicode Scripts"
Previous message: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in Unicode)"
Next in thread: Antoine Leca: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Antoine Leca: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Cowan: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Roozbeh Pournader: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: John Hudson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Peter_Constable@sil.org: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Tex Texin: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Michael Everson: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: Erland Sommarskog: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Maybe reply: DougEwell2@cs.com: "Re: Perception that Unicode is 16-bit (was: Re: Surrogate space in"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT