From: Keutgen, Walter (firstname.lastname@example.org)
Date: Mon May 15 2006 - 08:19:19 CDT
my thought 'is it really better trying to make it better than
the standards?' is a more general, which applies
to all standardizing endeavors, certainly in IT.
I am certainly not blaming Microsoft and the other browser providers
that they try everything to make web surfing an easy activity.
In this case clearly the server owners, authoring tool providers and authors
are to blame. Is it really so difficult to comply with the HTML standard
and tag correctly?
If having the user selecting the correct encoding is unrealistic,
because this is a step only professionals know about, what happens
in that case to these French users to whom ideographs are presented, do
they solve by guessing or just leave such web sites? If this selection
capability is indeed mostly unknown, this is one more argument for the auto
detection not to consider the ill-formed sequences as identifying a liberal
variant of UTF-8. The HTML content has no suitable tag and contains bad UTF-8,
that web site should be problematic to read, not others that are correct in
THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Doug Ewell
Sent: Saturday, 13 May 2006 07:07
To: Unicode Mailing List
Cc: Keutgen, Walter; Philippe Verdy
Subject: Re: Win IE 7b2 and UTF-8
Keutgen, Walter <walter dot keutgen at be dot unisys dot com> wrote:
> Microsoft should leave the ill formed UTF-8 sequences aside for the
> determination of the coded character set.
I agree that if encodings need to be autodetected, allowing invalid
UTF-8 to be handled as though it were valid UTF-8 hampers that effort.
It is a shame --but as Mark Davis said, probably a given -- that
autodetection is necessary at all.
> Or alternatively, would it not be simpler to stick to the standards
> and choose ISO-8859-1 when the HTML source does not provide any
Actually, the code to do what IE does is of about equal complexity to
the code to interpret UTF-8 strictly. I doubt it had anything to do
> More philosophically, is it really better to try making it better than
> the standards?
I *strongly* doubt that Microsoft is trying to reinvent UTF-8. As I
said, they were probably trying to "be liberal in what they accept," and
not have people throw eggs at their windows because some badly encoded
Web page wouldn't display.
> The reader can still correct by chosing the appropriate encoding.
> Then Microsoft could satisfy everybody by offering 'UTF-8 strict' and
> 'UTF-8 liberal' or better, if the UTF-8 stream contains ill formed
> sequences, offering the user to accept them by a pop-up dialogue.
How many users who do not subscribe to the Unicode list would understand
how to use an option like that?
-- Doug Ewell Fullerton, California, USA http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Mon May 15 2006 - 08:26:26 CDT