Re: cp1252 decoder implementation

From: Philippe Verdy <>
Date: Wed, 21 Nov 2012 05:13:18 +0100

To solve the situation, it would be smarter if the W3C was not referencing
the Microsoft standard itself but a standardized version of it, explaining
explicitly how to handle the unassigned code positions. The W3C coud
describe the expected mapping of these positions explicitly in its own
standard, or could publish a RFC and, possibly, register another code in
the IANA registry (but then it would cause another nightmare because it
would need a new alias than "windows-1252".

My opinion is that the W3C was wrong by indicating that the "ISO-8859-1"
charset had to be handled specially (as well as the "windows-1252"). In my
opinion it should just reference the "cp1252" charset name, changing its
aliasing to "windows-1252" and making it a full charset by its own,
endorsed by the W3C and described in its standard or in a published RFC,
making sure that all unassigned code positions of "windows-1252" will be
mapped **irrevocably** to C1 controls (even if later Microsoft decides to
map some other characters in its own "windows-1252" charset, like it did
several times and notably when the Euro symbol was mapped).

Charset aliases supported in HTML5 will then point to a new single
canonical charset. If "cp1252" is not appropriate for the choice of
canonical charset name, the w3c should propose another canonical name such
as "w3c-1252". And then the HTML5 standard will explain explicitly the
complete list of charset names to support as aliases to this canonical
charset, and should instruct web designers to use the canonical charset
name and nothing else, for HTML5 developments. If webdesigners are not
satisfied because HTML5 pages would loose their compatibility with HTML4,
they should use UTF-8 instead which is still treated identically in both
HTML4 and HTML5.

Anyway, the support of HTML4 renderers is almost impossible to support
completely with web designs for HTML5 and most frequently web sites made
for HTML5 will offer an alternate navigation or presentation for HTML4
renderers (with their known "quirks" modes that complicate things a lot).
But if web designers have some good skills (and enough money!) they can
attempt to build sites that will work on HTML4-only renderers as well as
HTML5 renderers. But this challenge requires HUGE and complex development
with lots of tiny adjustments (in scripts, stylesheets), with an extensive
system for handling user support requests and testing the various solutions
on separate development and test platforms.

And today this challenge has only been used successfully by Wikimedia sites
(which also support multiple viweing modes, including for mobile devices).
But thanks Wikimedia chose to support in MediaWiki these multiple viewing
mode only with UTF-8 (and it does not care about how to handle
windows-1252, which is not even supported). Even large corporate companies
are unable to support both HTML4 and HTML5 : sites generally are moving
from one to the other, even if they know that their content may no longer
be accessible to users of older browsers.

The challenge is somewhere else today: it lives in the proliferation of
mobile browing modes; as it does not work very well, web designers prefer
developing mobile "applications" that will be installed on tablets and
smartphones, using separate (and incompatible) development for the client
application, but at least allowing to have applications that will work with
the same server-side application (communicating with their more basic
protocol, most often with XML or JSON data requests, and HTTP to download
images). Mobile applications have completely changed the way to develop web

There's a clear separation now between client side developments and
server-side back-ends (with the advantage of possibly removing the
server-side front end, if the web site will only be available to users of
mobile apps, or users of modern browsers that support client-side
deployments for specific browsers supporting an API for such deployments,
such as Chrome and now IE8+ and the recent Windows 8 application store). In
that cas the server-side part of the website only needs ONE charset for
everything: UTF-8 (with data compression where needed).

It's just simply easier for web desiggners to develop a few separate
client-side "appplication" front-ends (for iPhones, Android, Windows,
sometimes for a few other brands) than supporting zillions of web browsers
and their versions. The bad side of it is about the absence of common
standard supported really supported across these client-side environments,
so they tend to split the Internet into separate worlds (and the dream of
interoperability becomes just that: a dream. Not even more ideal because
even these proprietary platforms start being fragmented as well, including
in the iPhone world, due to their versions and growing APIs not supported
in many older versions).

HTML5 is certainly the standard that should reconciliate these proprietary
platforms to use a common interoperable framework. But nobody knows if it
will be really well supported by those proprietary client-side development
platforms and by device manufacturers implementing them. The need for
interoperability has never been so acute than it is today.

But the battle is no longer within the charsets. Everybody now knows that
UTF-8 is THE charset for everything.

2012/11/20 Doug Ewell <>

> Buck Golemon <buck at yelp dot com> wrote:
> > What effort has been spent? This is not an either/or type of
> > proposition. If we can agree that it's an improvement (albeit small),
> > let's update the mapping.
> > Is it much harder than I believe it is?
> ISO/IEC 8859-1 is, uh, an ISO/IEC standard. CP1252 is a Microsoft
> corporate standard. One does not simply "update" someone else's
> standard, the WHATWG document and mapping tables notwithstanding.
Received on Tue Nov 20 2012 - 22:19:04 CST

This archive was generated by hypermail 2.2.0 : Tue Nov 20 2012 - 22:19:05 CST