From: Doug Ewell (firstname.lastname@example.org)
Date: Sun Sep 17 2006 - 17:52:45 CDT
This sounds remarkably like the study by Steven Atkin and Ryan
Stansifer, quoted in UTN #14, which attempted to prove 8-bit legacy
encodings -- optimized for a single language or family of languages --
are superior to Unicode because they encode those languages in fewer
bytes than Unicode, and because a particular compression scheme
(Burrows-Wheeler) compresses all encodings roughly equally.
Better support for SCSU over the past 8 years or so, from Unicode and
from industry, might have been able to put these complaints to rest.
SCSU compresses most non-CJK text to 1 byte per character, and most CJK
text to 2 bytes per character, the same as legacy charsets. Because
SCSU was relegated to the realm of "a higher-level protocol" and Unicode
continued to be represented
until 2001 as primarily a 16-bit encoding, industry support for this
very useful encoding scheme never got off the ground.
I would add that the heading "English bias" perpetuates a common and
destructive myth. 8-bit legacy encodings exist that support dozens of
languages besides English. To the extent that C and database
development tools exhibit a "bias" (which the passage does not prove),
it is a bias in favor of 8-bit legacy encodings and not the English
This archive was generated by hypermail 2.1.5 : Sun Sep 17 2006 - 18:09:18 CDT