Re: Strange Behavior by Win IE 6 displaying bad UTF-8

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 23 2006 - 14:31:37 CST

Next message: Anto'nio Martins-Tuva'lkin: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"

Previous message: Richard Wordingham: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
In reply to: Tom Gewecke: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Next in thread: Tom Gewecke: "Strange Behavior by Win IE 6 displaying bad UTF-8"
Reply: Tom Gewecke: "Strange Behavior by Win IE 6 displaying bad UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Tom Gewecke wrote at Sunday, April 23, 2006 on 8:09 PM

> Any security issues from this ability to read invalid UTF-8 as if it were
> valid?

Yes.

The contents of a file or incoming message may be read differently at
different points in the system. If the text is **always** converted to a
sequence of codepoints (or UTF-16) before anything is done with it, *I*
can't see any problem - but perhaps others can.

Suppose you were receiving incoming instructions, and you would be happy
executing them if they did not contain the word 'delete'. You can search
valid UTF-8 text bytewise for particular text. However, if it is encoded
invalidly, you will not find the text even though the over-tolerant
interpreter may recover the word 'delete'.

Someone therefore needs to worry about this. It may that there is no
problem, but it depends how pervasive the over-tolerance is. I hope this
issue is always borne in mind when the e-mail/browser systems is enhanced,
and there might be a big problem with a port from a UTF-16 based system to a
UTF-8 based system. It is possible that the bug is tolerated because users
would complain if web sites suddenly became unreadable.

Richard.

Next message: Anto'nio Martins-Tuva'lkin: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
Previous message: Richard Wordingham: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"
In reply to: Tom Gewecke: "Re: Strange Behavior by Win IE 6 displaying bad UTF-8"
Next in thread: Tom Gewecke: "Strange Behavior by Win IE 6 displaying bad UTF-8"
Reply: Tom Gewecke: "Strange Behavior by Win IE 6 displaying bad UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Apr 23 2006 - 14:33:41 CST