Re: Strange Behavior by Win IE 6 displaying bad UTF-8

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Apr 23 2006 - 14:31:37 CST

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Pan-Turkic Alphabet of 1926, Latin letter like U+042C/U+044C or U+0184/U+0185"

    Tom Gewecke wrote at Sunday, April 23, 2006 on 8:09 PM

    > Any security issues from this ability to read invalid UTF-8 as if it were
    > valid?

    Yes.

    The contents of a file or incoming message may be read differently at
    different points in the system. If the text is **always** converted to a
    sequence of codepoints (or UTF-16) before anything is done with it, *I*
    can't see any problem - but perhaps others can.

    Suppose you were receiving incoming instructions, and you would be happy
    executing them if they did not contain the word 'delete'. You can search
    valid UTF-8 text bytewise for particular text. However, if it is encoded
    invalidly, you will not find the text even though the over-tolerant
    interpreter may recover the word 'delete'.

    Someone therefore needs to worry about this. It may that there is no
    problem, but it depends how pervasive the over-tolerance is. I hope this
    issue is always borne in mind when the e-mail/browser systems is enhanced,
    and there might be a big problem with a port from a UTF-16 based system to a
    UTF-8 based system. It is possible that the bug is tolerated because users
    would complain if web sites suddenly became unreadable.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Apr 23 2006 - 14:33:41 CST