Re: DEC multilingual code page, ISO 8859-1, etc.

From: Jungshik Shin (jshin@pantheon.yale.edu)
Date: Tue Mar 28 2000 - 18:23:41 EST


On Tue, 28 Mar 2000, Erik van der Poel wrote:

> That depends on the particular private code page. In the case of
> windows-1252, there are far more users with mail software that works
> with windows-1252 than UTF-8 (or any other encoding of Unicode).
>
> There is an old saying "Be conservative in what you send, liberal in
> what you accept". As far as windows-1252 vs UTF-8 is concerned, sending
> UTF-8 is *not* conservative. We need to wait until more people have
> UTF-8 capable software installed.

  I don't think sending out Windows-1252 on the wire is 'conservative'
either. (I'm so tired of getting Windows-1252 encoded messages with
NOT-SO-ESSENTIAL characters interpersed among valid ISO-8859-1 characters
that I'm gonna write a procmail filter to remove/transliterate them) If
you really wanna stick to that saying, you should convert WIndows-1252
to ISO-8859-1(with appropriate transliteration) on the way out just
like more standard compliant programs on MacOS side do with MacRoman.
Better still is give users the choice between UTF-8 and ISO-8859-1 with
transliteration for chars only present in Windows-1252.

  The same holds true for Windows-949(which is a proprietary extension
of EUC-KR used in Korean version of MS-Windows 9x) which also violates
ISO-2022 using C1 characters in the first and the second byte and
G0 characters in the second byte. Korean MS OE/FrontPage don't even
try to label documents/messages with C1 chars. as x-Windows-949 but
pretend that they're valid EUC-KR documents/messages(but with completely
misleading/ non-standard MIME charset name of MS's own invention, namely
ks_c_5601-1987). MS OE/FrontPage should convert Windows-949 to valid
EUC-KR(there's a standard way to encode chars represented by C1 area by
Windows-949 in valid EUC-KR which is supported by Mozilla 5.0 and which
can be rendered legibly even by terminals not supporting that standard
extension mechanism specified in KS X 1001) or to UTF-8. Hmm, I realize
that this paragraph had better been put below Chris Pratley's claim
(<chrispr@microsoft.com>) that MS products at least label Windows-1252
properly, which I believe is true, but 'a little extension' of which
to Windows-949 is not true.

 I simply don't understand why MS-Windows network applications cannot
do with Windows-125x what MacOS applications have been doing for a long
time with MacRoman. It gets really interesting when you compare
Eudora for MS-Windows(which sends out a bunch of *mislabelled* - i.e.
labelled as ISO-8859-1 - Windows-1252 encoded messages) and
Eudora for MacOS (which converts between MacRoman and ISO-8859-1
on the way out and in.)

    Jungshik Shin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT