Re: Win IE 7b2 and UTF-8

From: Doug Ewell (
Date: Sat May 13 2006 - 00:06:47 CDT

  • Next message: Doug Ewell: "Re: Win IE 7b2 and UTF-8"

    Keutgen, Walter <walter dot keutgen at be dot unisys dot com> wrote:

    > Microsoft should leave the ill formed UTF-8 sequences aside for the
    > determination of the coded character set.

    I agree that if encodings need to be autodetected, allowing invalid
    UTF-8 to be handled as though it were valid UTF-8 hampers that effort.
    It is a shame --but as Mark Davis said, probably a given -- that
    autodetection is necessary at all.

    > Or alternatively, would it not be simpler to stick to the standards
    > and choose ISO-8859-1 when the HTML source does not provide any
    > charset.

    Actually, the code to do what IE does is of about equal complexity to
    the code to interpret UTF-8 strictly. I doubt it had anything to do
    with that.

    > More philosophically, is it really better to try making it better than
    > the standards?

    I *strongly* doubt that Microsoft is trying to reinvent UTF-8. As I
    said, they were probably trying to "be liberal in what they accept," and
    not have people throw eggs at their windows because some badly encoded
    Web page wouldn't display.

    > The reader can still correct by chosing the appropriate encoding.
    > Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and
    > 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed
    > sequences, offering the user to accept them by a pop-up dialogue.

    How many users who do not subscribe to the Unicode list would understand
    how to use an option like that?

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Sat May 13 2006 - 00:08:19 CDT