RE: Fun with proof by analogy, was Re: Mojibake on my Web pages

From: Francois Yergeau (
Date: Mon Sep 29 2003 - 10:27:18 EDT

  • Next message: Mark Davis: "IAB positions with respect ISO royalties, CS"

    James Kass wrote:
    > In the event of a conflict between the HTTP header and the HTML meta
    > tag, of course the browser should believe the HTML meta tag. After
    > all, who knows better than the author the encoding used to construct
    > the file?

    Who knows better the encoding used to *send* the file? The last server to
    touch it.

    It used to be common, the norm in fact, for Russian servers to store files
    in various legacy encodings (KOI-8, 8859-5, DOS-something,...) and to serve
    them in some other encoding, after transcoding on-the-fly based on the
    User-Agent. There were also transcoding proxies for Asian character sets
    that one could use to overcome the limitations of browsers of that era.
    These practices were still around when the HTML 4 spec was released in 1997
    and no doubt contributed to getting things as they are.

    > Where the server has performed a character set conversion
    > upon request from a browser, then, as a part of the character set
    > conversion process, the HTML meta tag needs to be re-written in case
    > the page is archived by the visitor for later off-line viewing.

    It takes large amounts of tricky code to reliably parse real-life HTML. It
    is unreasonable to expect servers, which have no business parsing HTML, to
    contain this code. Browsers have it and *they* should adjust the meta tag
    when they do a "Save as..."

    > If this were the case, we wouldn't be having this thread.

    If servers would just shut up when they don't know (as required by the HTML

    François Yergeau

    This archive was generated by hypermail 2.1.5 : Mon Sep 29 2003 - 11:15:10 EDT