Re: Frequent incorrect guesses by the charset autodetection in IE7

From: James Kass (
Date: Thu Jul 13 2006 - 08:23:08 CDT

  • Next message: Philippe Verdy: "Re: Frequent incorrect guesses by the charset autodetection in IE7"

    Philippe Verdy wrote,

    > An example of website that becomes horrible because of that, or that exhibits
    > runtime errors in javascripts due to incorrect selection in a page that is
    > clearly French and with enough text content to confirm this, including the
    > domain name, and where no CJK charset should be autodetected:
    > (this is the official web site for the French delegation of the Red Cross).

    The French Red Cross page has no character set declaration. (Well, it has one,
    but it is left blank/empty.) The very first character in the HTML file which
    is not mark-up is ✚ (U+271A, HEAVY GREEK CROSS). But, that's an NCR
    which is, of course, in ASCII and shouldn't affect any heuristics regarding
    character sets.

    Accented characters called with named HTML references (like é) display
    just fine on this page while non-ASCII material seems to display as CJK ideographs.

    Interestingly, setting the character set to auto-detect in MSIE 6 results in
    correct display. (I normally operate with auto-detect disabled.)

    In the absence of a character set declaration in the HTML, why shouldn't
    a modern browser default to UTF-8? Unicode is the universal character
    set and UTF-8 its most popular character set in web pages.

    > Having to manually select the correct encoding when navigating a large web site
    > with many pages is really irritating for users...

    Which is why I normally operate with auto-select disabled. Choose the
    character set which you expect to encounter most often, set the browser
    to that character set, and disable auto-select. Pages correctly labelled
    and served will display in their correct character sets, pages which aren't
    will display in your selected default.

    > ... (why doesn't IE consider the
    > selected encoding of the previous page when navigating across pages of the same
    > domain, when the same multiple encodings are possible candidates for the autodetection heuristic?)

    If all the pages in the same large domain are equally bad, this wouldn't
    solve anything. In the Red Cross example, the redirect page to the example
    seems to have the same problem.

    > ...but non-profit organization often lack the money and internal development team
    > to make such corrections in what could be a nightmare for them to handle ...

    Sounds like they need a volunteer. Perhaps someone who speaks French and
    appears to have a little spare time?

    Best regards,

    James Kass

    This archive was generated by hypermail 2.1.5 : Thu Jul 13 2006 - 10:29:25 CDT