RE: Fun with proof by analogy, was Re: Mojibake on my Web pages

From: Francois Yergeau (FYergeau@alis.com)
Date: Mon Sep 29 2003 - 10:27:18 EDT

  • Next message: Mark Davis: "IAB positions with respect ISO royalties, CS"

    James Kass wrote:
    > In the event of a conflict between the HTTP header and the HTML meta
    > tag, of course the browser should believe the HTML meta tag. After
    > all, who knows better than the author the encoding used to construct
    > the file?

    Who knows better the encoding used to *send* the file? The last server to
    touch it.

    It used to be common, the norm in fact, for Russian servers to store files
    in various legacy encodings (KOI-8, 8859-5, DOS-something,...) and to serve
    them in some other encoding, after transcoding on-the-fly based on the
    User-Agent. There were also transcoding proxies for Asian character sets
    that one could use to overcome the limitations of browsers of that era.
    These practices were still around when the HTML 4 spec was released in 1997
    and no doubt contributed to getting things as they are.

    > Where the server has performed a character set conversion
    > upon request from a browser, then, as a part of the character set
    > conversion process, the HTML meta tag needs to be re-written in case
    > the page is archived by the visitor for later off-line viewing.

    It takes large amounts of tricky code to reliably parse real-life HTML. It
    is unreasonable to expect servers, which have no business parsing HTML, to
    contain this code. Browsers have it and *they* should adjust the meta tag
    when they do a "Save as..."

    > If this were the case, we wouldn't be having this thread.

    If servers would just shut up when they don't know (as required by the HTML
    spec)....

    -- 
    François Yergeau
    


    This archive was generated by hypermail 2.1.5 : Mon Sep 29 2003 - 11:15:10 EDT