Re: CP1252 under Unix

From: Frank da Cruz (fdc@columbia.edu)
Date: Sat Mar 25 2000 - 16:28:14 EST


> I agree with you that the overal goal should be to move to UTF-8 for
> transmission. However, ignoring 1252 and its cousins is both wrong and
> shortsighted.
>
> 1. Let's start with the wrong part. There are already IANA registered
> charsets that use the C1 area for graphic character sets.
>
The true question is whether IANA should have "registered" any of them.
Again, private, proprietary character sets have no place in any standard,
nor on the wire across the Internet. I really don't know what they were
thinking when they started down this path. I fully agree with you that
once they have registered Windows 1251 then they have no reason not to
register 1252 and every other code page that exists, and not just at IBM
and Microsoft either.

But please folks, let's not confuse the IANA registry with any kind
of standard. Standards imply at the very least consensus among conflicting
interests, and at best also some measure of quality control.

> 2. Now for the shortsighted part. The IANA registry is used for much more
> than simply interchange on the web. A registry of charset names is needed
> across all systems and platforms. That way, cross-platform programs can to
> identify the local charsets, and successfully and accurately translate those
> to and from Unicode/10646 or specific other codesets.
>
Again: No, no, no! If you don't put private character sets on the wire,
you don't need to know a thing about them.

Does anybody who reads this list truly believe it is better to use private
code pages for interchange than it is to use standard ones? That means I can
send you ANYTHING AT ALL, even something you've never heard of, and it's your
fault if you can't read it, not mine.

If you are selling a Windows-based email client, HOW HARD IS IT to convert
outgoing mail from the local code page to ISO 8859 or other standard character
set? Ditto for Windows-based Web authoring tools. The fact that this has
not been done is no reason for the rest of the planet to drop what they are
doing (presumably moving us along towards a Unicode based network) and bend
over backwards to accommodate this kind of behavior.

> Our goal is to converge towards use of a single character set, but that
> transition is easier if we can precisely identify those character sets that
> ARE in use on the Web currently, not hiding our heads in the sand and hoping
> they will go away.
>
This is a backwards view of the problem. It is the responsibility of IBM,
Apple, Microsoft, and other companies with private character sets, or makers
of software that use these private sets for interchange, to convert them to
use standard sets, preferably UTF-8. That's where the problem is and that's
where to fix it.

Put yourself in the position of an ISV. I want to be a good world citizen.
What must I do? Should I code my applications for Unicode? No, that's not
enough. I have to code them to understand every character set that exits --
or at least that is significant in the marketplace (which marketplace?).

Does this promote the spread of Unicode?

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT