Re: Strange Behavior by Win IE 6 displaying bad UTF-8

From: Tom Gewecke (tom@bluesky.org)
Date: Sun Apr 23 2006 - 13:09:04 CST

Next message: Richard Wordingham: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"

Previous message: Karl Pentzlin: "Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
In reply to: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Next in thread: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Reply: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On Apr 23, 2006, at 9:49 AM, Richard Wordingham wrote:

>
> It's actually very simple. Given an initial byte E1, the next two
> bytes must be of the form 10xxxxxx 10xxxxxx. If the parser then
> trusts alleged UTF-8 to be valid UTF-8 (which it should not), it can
> then ignore the non-x bits. Now, it is the second and third bytes
> that are incorrect, being FC and D0 rather than BC and 90, ie. bit 6
> is 1 whereas it must be 0. The low six bits of FC (wrong) and BC
> (correct) and D0 (wrong) and 90 (correct) are the same.
>

Thanks! This would explain some other weird things I have seen in Win
Outlook, where invalid byte sequences can get displayed as Chinese
characters.

Apparently there is some code around which also generates erroneous
UTF-8 like this, which is then pretty hard to detect for a Win IE user.

Any security issues from this ability to read invalid UTF-8 as if it
were valid?

Next message: Richard Wordingham: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
Previous message: Karl Pentzlin: "Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
In reply to: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Next in thread: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Reply: Richard Wordingham: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Apr 23 2006 - 13:11:03 CST