Re: Frequent incorrect guesses by the charset autodetection in IE7

From: Sinnathurai Srivas (
Date: Thu Jul 13 2006 - 19:19:23 CDT

  • Next message: Philippe Verdy: "Re: Frequent incorrect guesses by the charset autodetection in IE7"

    An important mater is that all new iso8859-x encodin were banned for nearly
    20 years now. This means support was forth coming from Microsoft and all
    major players from since 20 years ago.

    Unfortunatly, the illegal hacked ASCII and hacked ISO8859 are still the
    default of these major players.

    What an unjust world.


    ----- Original Message -----
    From: "James Kass" <>
    To: "Philippe Verdy" <>; "Unicode Mailing List"
    Sent: Thursday, July 13, 2006 2:23 PM
    Subject: Re: Frequent incorrect guesses by the charset autodetection in IE7

    > Philippe Verdy wrote,
    >> An example of website that becomes horrible because of that, or that
    >> exhibits
    >> runtime errors in javascripts due to incorrect selection in a page that
    >> is
    >> clearly French and with enough text content to confirm this, including
    >> the
    >> domain name, and where no CJK charset should be autodetected:
    >> (this is the official web site for the French delegation of the Red
    >> Cross).
    > The French Red Cross page has no character set declaration. (Well, it has
    > one,
    > but it is left blank/empty.) The very first character in the HTML file
    > which
    > is not mark-up is &#10010; (U+271A, HEAVY GREEK CROSS). But, that's an
    > NCR
    > which is, of course, in ASCII and shouldn't affect any heuristics
    > regarding
    > character sets.
    > Accented characters called with named HTML references (like &eacute;)
    > display
    > just fine on this page while non-ASCII material seems to display as CJK
    > ideographs.
    > Interestingly, setting the character set to auto-detect in MSIE 6 results
    > in
    > correct display. (I normally operate with auto-detect disabled.)
    > In the absence of a character set declaration in the HTML, why shouldn't
    > a modern browser default to UTF-8? Unicode is the universal character
    > set and UTF-8 its most popular character set in web pages.
    >> Having to manually select the correct encoding when navigating a large
    >> web site
    >> with many pages is really irritating for users...
    > Which is why I normally operate with auto-select disabled. Choose the
    > character set which you expect to encounter most often, set the browser
    > to that character set, and disable auto-select. Pages correctly labelled
    > and served will display in their correct character sets, pages which
    > aren't
    > will display in your selected default.
    >> ... (why doesn't IE consider the
    >> selected encoding of the previous page when navigating across pages of
    >> the same
    >> domain, when the same multiple encodings are possible candidates for the
    >> autodetection heuristic?)
    > If all the pages in the same large domain are equally bad, this wouldn't
    > solve anything. In the Red Cross example, the redirect page to the
    > example
    > seems to have the same problem.
    >> ...but non-profit organization often lack the money and internal
    >> development team
    >> to make such corrections in what could be a nightmare for them to handle
    >> ...
    > Sounds like they need a volunteer. Perhaps someone who speaks French and
    > appears to have a little spare time?
    > Best regards,
    > James Kass

    This archive was generated by hypermail 2.1.5 : Thu Jul 13 2006 - 19:23:54 CDT