RE: cp1252 decoder implementation

From: Doug Ewell <>
Date: Wed, 21 Nov 2012 11:30:50 -0700

"Peter Krefting" <peter at opera dot com> wrote:

>> Somewhat off-topic, I find it amusing that tolerance of "poorly
>> encoded" input is considered justification for changing the
>> underlying standards, when Internet Explorer has been flamed for
>> years and years for tolerating bad input.
> It's called adapting to reality, unfortunately. There are *a lot* of
> documents on the web labelled as being "iso-8859-1" and/or not
> labelled at all, which are using characters from the 1252 codepage.
> And since using the 1252 codepage to decode "proper" iso-8859-1 HTML
> documents does not hurt anyone (as HTML up to version 4 explicitly
> forbids the use of the control codes in the 0x80-0x9F range), that is
> what everyone does.

My problem is with the double standard. In some people's minds, if IE
does it, it's called "moronic" or "brain-dead."

>>> One browser started to accept data in a form that it shouldn't have
>>> accepted. Sloppy content producers started to rely on this. Because
>>> the browser in question was the dominant browser, other browsers had
>>> to try and re-engineer and follow that browser, or just be ignored.
>> Evidently it's OK if W3C or Python does it, but not if Microsoft does
>> it.
> Don't blame Microsoft here, it was Netscape (on Windows) that started
> it, by just mapping the iso-8859-1 input data to a windows-1252
> encoded font output. The same pages that would work "fine" on Windows
> would show garbage on Unix, until it was patched to also display it as
> codepage 1252. Internet Explorer wasn't even published when this
> happened, and I can't remember now whether the first versions of it
> actually did this, or if it was bolted on later.

This is the first time I've heard anyone say the problem didn't
originate with IE.

Doug Ewell | Thornton, Colorado, USA | @DougEwell ­
Received on Wed Nov 21 2012 - 12:31:56 CST

This archive was generated by hypermail 2.2.0 : Wed Nov 21 2012 - 12:31:56 CST