RE: Win IE 7b2 and UTF-8

From: Keutgen, Walter (
Date: Fri May 12 2006 - 12:51:22 CDT

  • Next message: Kenneth Whistler: "Re: Mysteries in the BMP Roadmap"


    your excellent conclusion lets aside the autodetection.

    Philippe Verdi wrote:

    > Doesn't it break or severely limits the encoding autodetection in IE?
    > This may explain why IE so often displays Chinese characters in the middle
    > of a French webpage hosted on a server that simply does not specify its
    > actual encoding: IE returns a false positive match with UTF-8, instead
    > of identifying the ISO-8859-1 encoding that was actually used.
    > This is a severe and very ennoying bug for users (like French users trying
    > to read webpages that were encoded as ISO-8859-1 but interpreted by default
    > as UTF-8 as if it was Chinese, even though it would be invalid UTF-8).

    Microsoft should leave the ill formed UTF-8 sequences aside for the determination of the coded character set.

    Or alternatively, would it not be simpler to stick to the standards and choose ISO-8859-1 when the HTML source does not provide any charset. More philosophically, is it really better to try making it better than the standards?

    The reader can still correct by chosing the appropriate encoding. Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed sequences, offering the user to accept them by a pop-up dialogue.

    Best regards

    Walter Keutgen
    Unisys Belgium

    THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

    -----Original Message-----
    From: [] On Behalf Of Doug Ewell
    Sent: 12 May 2006 17:45
    To: Unicode Mailing List
    Subject: Re: Win IE 7b2 and UTF-8

    Through the years, Microsoft and especially IE have taken a great deal
    of criticismfor being either too liberal or too consenvative (or both)
    in what they accept. Whichever they choose, there is sure to be someone
    waiting in the wings to lambast them for it.

    IMHO, what Microsoft should do with regard to decoding invalid UTF-8
    sequences is make a decision, one way or the other, and document that
    decision openly. That way the debate, and there is sure to be one, will
    have to focus on the policy and not whether the software is "buggy."

    My personal preference (RFC 793 notwithstanding) would be for IE to
    decline to interpret invalid UTF-8, since that is the more secure
    approach. As Philippe himself pointed out, there's probably not much of
    this type of data out there. But it is their call.

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Fri May 12 2006 - 12:54:28 CDT