Re: A really bad article about Unicode

From: Mark Davis (markdavis@ispchannel.com)
Date: Tue May 02 2000 - 02:40:22 EDT


Good point -- we'll remove that one.

Mark

Doug Ewell wrote:

> The Unicode Web site includes a page titled "Press Information" at
>
> http://www.unicode.org/press/
>
> This page contains links to a variety of articles, written by members of
> the general computing press, that have something to do with Unicode.
>
> One of the articles, written by Amy Burns of Microsoft, is called
> "Unicode, UTF-8, UCS-2, UCS-4 ... What Is All of This?" and is available
> at
>
> http://msdn.microsoft.com/workshop/management/intl/unicode.asp
>
> This article, written in January 1998 and intended for designers of Web
> pages, is so full of misinformation, inappropriate emphasis, and just
> plain silly errors that I wonder why the Unicode Consortium chose to
> include it in what appears to be a list of "recommended" articles.
>
> In a section called "So Who's in and Who's out?" there is a gratuitous
> discussion of the difference between "primary" and "pseudo" scripts,
> which is unlikely to tell a Web page designer much except that Unicode
> is big and complicated and scary. This is followed by lists of supported
> scripts, and longer lists of unsupported ones. The overall impression is
> that the Unicode architects have chosen to exclude a great many scripts,
> seemingly arbitrarily.
>
> "UCS-2" is identified one-for-one with "Unicode," whereas "UCS-4"
> (mistyped "USC-4" as often as not) is identified one-for-one with "ISO
> 10646." This is misleading, considering that the repertoires of Unicode
> and ISO 10646 are identical, and plans to encode Unicode characters in
> the Astral Planes were well known before 1998.
>
> The next paragraph not only reverses this artificial 2-byte/4-byte
> distinction, but is absolutely the worst description of Unicode I have
> ever seen:
>
> This means if you are using Unicode, your text is being broken up
> every four bytes and sent through the ozone to be reconstructed at
> the other end. If you are using "supported" scripts, you're okay.
> It will put your words back the way it found them when they reach
> their destination. If you use a language that Unicode does not
> currently support, your text will appear corrupted at the other end.
> Perhaps the words will be munged, or extra spaces will be added, or
> some other creative interpretation.
>
> Just imagine yourself sitting in a meeting room, listening to someone
> describe Unicode to a Web designer this way.
>
> Burns then begins a cursory discussion of UTF-8, which she says "allows
> 32-bit encoding of ISO 10646, and breaks up your characters between each
> byte instead of every four bytes." O-kay, I'm glad we cleared that up.
> (At least the UTF-8 examples are correct.)
>
> The article concludes with another Unicode-is-scary statement:
>
> Diving into the depths of Unicode gets to be a serious lesson in
> octets, binary, division and positive visualization.
>
> but it sounds like Burns is the one with a serious case of the bends.
>
> With all the Microsoft experts on this mailing list, it should be easy
> to find an article written by someone from Microsoft that expresses some
> knowledge and understanding about Unicode. The Burns article reads like
> a bad junior high school essay, and does not deserve to be linked to the
> Unicode Web site.
>
> -Doug Ewell
> Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT