Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

From: Glen Perkins (
Date: Tue Aug 25 1998 - 15:56:26 EDT

-----Original Message-----
From: Gunther Schadow <>
To: Unicode List <>
Date: Monday, August 24, 1998 12:44 PM
Subject: Re: Is there a UTF that allows ISO 8859-1 (latin-1)?

>So, if you are one of those highly respected members of the world
>population who prefers writing in Greek, Kyrillic, Chinese, Japanese,
>Devanagari, Thai, or Malayalam, I do not ask you to bother with me for
>an ISO Latin-1 compatible UTF. But I ask you to think why an
>Anglo-Americanocentric UTF is good while a UTF for all scripts based
>on Latin is so bad and politically incorrect to call for (BTW:
>wouldn't vietnamese be supported by ISO Latin-1 as well?).

No. Latin-1 is not at all sufficient for "all scripts based on Latin". It's

>If the
>Khmer script were integrated into Unicode in some backward compatible
>manner, like to always strip or add some bits from your code page, I
>would certainly support your call for a UTF that facilitates a
>graceful transformation of the Terabytes of legacy software used and
>produced in Kampuchea. I am open to this.
>But may I please ask you (especially the US-residents among the
>fighters for political correctness) at least not to interfere with a
>call for a UTF that is as compatible as Unicode is by itself? I think
>that the issue with UTF-7 and UTF-8 is more about broadening the
>narrow Anglo-American view on the world than to narrow the beautiful
>global view of Unicode towards an Euro-centrism.

Just a quick note regarding this "narrow Anglo-American-view" nonsense. The
VAST majority of 1990s Angles and Americans use an 8-bit encoding, such as
Windows Code Page 1252, for the vast majority of documents. While it would
certainly be easier for an American to stick to 7-bit ASCII than for a
German to do so--if he had to--in practice Americans (and Brits) almost
never do so because they don't have to.

Most Americans and Brits use Windows or Macintosh for most documents. Win
and Mac have used 8-bit encodings from day one. This discussion is all about
"legacy documents". Well, outside of some email and unix configuration
files, you simply can't assume that a US or UK document is encoded in 7-bit
"US-ASCII". It almost never is (even if some header claims it is).

We use curly quotes and apostrophes, we use diacritics on many common
English words and to write the names of a large number of our citizens. And,
our business documents overflow with bullets and trademark symbols, and
these characters aren't even in Latin-1! We have to go beyond Latin-1 just
to write a typical US business letter, and we don't hesitate to do so
because everyone uses Windows (CP1252), right? ;-)

If you think of UTF-8 solely as a compression mechanism then I agree that it
works somewhat better for English than for German, but it's not a
particularly good compression mechanism for either one. If you're thinking
of it solely for compression, you should probably consider some other

Most real-world US and UK documents are no more "compatible" (in Dan's
sense) with UTF-8 than are most European documents. Single-byte legacy tools
with no knowledge of UTF-8 are are almost as likely to choke on a US
business letter converted to UTF-8 as on a German one, so you'd have to
assume in both cases that a non-UTF-8 legacy tool simply can't be used on a
UTF-8 document, in the US just as in France. The issues, and the solutions,
for English-speakers and non-English-speaking Europeans are not so
different. It's not some vast, right-wing...ahem, I mean English-speakers'
conspiracy. ;-)

__Glen Perkins__

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:41 EDT