Date: Sat May 13 2006 - 00:06:47 CDT

    Keutgen, Walter <walter dot keutgen at be dot unisys dot com> wrote:

    > Microsoft should leave the ill formed UTF-8 sequences aside for the
    > determination of the coded character set.

    I agree that if encodings need to be autodetected, allowing invalid
    UTF-8 to be handled as though it were valid UTF-8 hampers that effort.
    It is a shame --but as Mark Davis said, probably a given -- that
    autodetection is necessary at all.

    > Or alternatively, would it not be simpler to stick to the standards
    > and choose ISO-8859-1 when the HTML source does not provide any
    > charset.

    Actually, the code to do what IE does is of about equal complexity to
    the code to interpret UTF-8 strictly. I doubt it had anything to do
    with that.

    > More philosophically, is it really better to try making it better than
    > the standards?

    I *strongly* doubt that Microsoft is trying to reinvent UTF-8. As I
    said, they were probably trying to "be liberal in what they accept," and
    not have people throw eggs at their windows because some badly encoded
    Web page wouldn't display.

    > The reader can still correct by chosing the appropriate encoding.
    > Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and
    > 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed
    > sequences, offering the user to accept them by a pop-up dialogue.

    How many users who do not subscribe to the Unicode list would understand
    how to use an option like that?

    Doug Ewell
    Fullerton, California, USA

