From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 23 2006 - 14:31:37 CST
Tom Gewecke wrote at Sunday, April 23, 2006 on 8:09 PM
> Any security issues from this ability to read invalid UTF-8 as if it were
> valid?
Yes.
The contents of a file or incoming message may be read differently at
different points in the system. If the text is **always** converted to a
sequence of codepoints (or UTF-16) before anything is done with it, *I*
can't see any problem - but perhaps others can.
Suppose you were receiving incoming instructions, and you would be happy
executing them if they did not contain the word 'delete'. You can search
valid UTF-8 text bytewise for particular text. However, if it is encoded
invalidly, you will not find the text even though the over-tolerant
interpreter may recover the word 'delete'.
Someone therefore needs to worry about this. It may that there is no
problem, but it depends how pervasive the over-tolerance is. I hope this
issue is always borne in mind when the e-mail/browser systems is enhanced,
and there might be a big problem with a port from a UTF-16 based system to a
UTF-8 based system. It is possible that the bug is tolerated because users
would complain if web sites suddenly became unreadable.
Richard.
This archive was generated by hypermail 2.1.5 : Sun Apr 23 2006 - 14:33:41 CST