RE: DEC multilingual code page, ISO 8859-1, etc.

From: Chris Pratley (chrispr@MICROSOFT.com)
Date: Fri Mar 24 2000 - 19:52:00 EST


Hmm. Do you have numbers on "vast majority"?

Office97 and Office2000 have always labelled windows-1252 as such in HTML,
so any of those 60million users generating HTML will have it correctly
marked - it is hard to believe those docs don’t form the majority of truly
win-1252 pages by now. I am aware of some early web authoring tools (some
made by Microsoft) that would generate NCRs not based on Unicode, but they
were written before it was clearly noted in the HTML spec that those NCRs
need to be Unicode values. I think the original standard was a bit vague
about what those numbers mean, and especially by use of the term "document
charset". Although I think that was supposed to mean iso-8859-1 at the time
in standards parlance, the average person reading that would take it at face
value - the charset of the current document. It is a bit harsh to judge
those early developers based on a standard (use Unicode for NCRs) that was
only broadly communicated after their code was written.

Another major source of non-Unicode NCRs are handwritten pages, often
generated by commercial web sites designers or by databases they designed.
I'm not sure how to reach them, other than mailing the web master when you
see such a site and letting them know their error. And I don’t think getting
hardcore and disabling the current browser workaround of treating #128;
through #159; as windows-1252 is the right way either - it is just
frustrating and leads to a "buggy" experience for the end-user. Eventually
those pages will disappear as more people use tools to generate correctly
formed pages, but I'm not holding my breadth for the top Internet sites to
start doing that any time soon.

Chris Pratley
Group Program Manager
Microsoft Word

-----Original Message-----
From: Robert Brady [mailto:robert@susu.org.uk]
Sent: Friday, 24 March 2000 10:16 AM
To: Unicode List
Cc: Unicode List
Subject: RE: DEC multilingual code page, ISO 8859-1, etc.

On Fri, 24 Mar 2000, Chris Pratley wrote:

> when the characters in the 80-CF range are used.  I'm curious why the
makers

> of whatever browsers these are don't simply add support for non-ISO

> encodings like windows-1252 and be done with it (whether windows-1252 is

The problem is that the vast majority of windows-1252 texts, aren't

actually labelled as such. Often, they will claim to be ISO-8859-1,

US-ASCII and then actually be in Windows-1252.

Worse, in the case of HTML, they might claim to be in windows-1252, and

then expect € to Ÿ to do what they expect (as opposed to the

correct thing, which is to put question marks up there).

--

Robert



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:00 EDT