RE: Win IE 7b2 and UTF-8

From: Keutgen, Walter (
Date: Mon May 15 2006 - 08:19:19 CDT

  • Next message: Doug Ewell: "Re: Win IE 7b2 and UTF-8"


    my thought 'is it really better trying to make it better than
    the standards?' is a more general, which applies
    to all standardizing endeavors, certainly in IT.

    I am certainly not blaming Microsoft and the other browser providers
    that they try everything to make web surfing an easy activity.

    In this case clearly the server owners, authoring tool providers and authors
    are to blame. Is it really so difficult to comply with the HTML standard
    and tag correctly?

    If having the user selecting the correct encoding is unrealistic,
    because this is a step only professionals know about, what happens
    in that case to these French users to whom ideographs are presented, do
    they solve by guessing or just leave such web sites? If this selection
    capability is indeed mostly unknown, this is one more argument for the auto
    detection not to consider the ill-formed sequences as identifying a liberal
    variant of UTF-8. The HTML content has no suitable tag and contains bad UTF-8,
    that web site should be problematic to read, not others that are correct in
    another encoding.

    Best regards


    THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.

    -----Original Message-----
    From: [] On Behalf Of Doug Ewell
    Sent: Saturday, 13 May 2006 07:07
    To: Unicode Mailing List
    Cc: Keutgen, Walter; Philippe Verdy
    Subject: Re: Win IE 7b2 and UTF-8

    Keutgen, Walter <walter dot keutgen at be dot unisys dot com> wrote:

    > Microsoft should leave the ill formed UTF-8 sequences aside for the
    > determination of the coded character set.

    I agree that if encodings need to be autodetected, allowing invalid
    UTF-8 to be handled as though it were valid UTF-8 hampers that effort.
    It is a shame --but as Mark Davis said, probably a given -- that
    autodetection is necessary at all.

    > Or alternatively, would it not be simpler to stick to the standards
    > and choose ISO-8859-1 when the HTML source does not provide any
    > charset.

    Actually, the code to do what IE does is of about equal complexity to
    the code to interpret UTF-8 strictly. I doubt it had anything to do
    with that.

    > More philosophically, is it really better to try making it better than
    > the standards?

    I *strongly* doubt that Microsoft is trying to reinvent UTF-8. As I
    said, they were probably trying to "be liberal in what they accept," and
    not have people throw eggs at their windows because some badly encoded
    Web page wouldn't display.

    > The reader can still correct by chosing the appropriate encoding.
    > Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and
    > 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed
    > sequences, offering the user to accept them by a pop-up dialogue.

    How many users who do not subscribe to the Unicode list would understand
    how to use an option like that?

    Doug Ewell
    Fullerton, California, USA

    This archive was generated by hypermail 2.1.5 : Mon May 15 2006 - 08:26:26 CDT