re: Filtering and displaying untrusted UTF-8

From: verdy_p (
Date: Sun Dec 27 2009 - 20:48:31 CST

  • Next message: verdy_p: "Re: HTML5 encodings" wrote:
    > My final question is this: which of the (in the previous steps)
    > allowed code points ***higher than*** 127 do I have to "HTML encode"
    > if I display them in an HTML page? None? Or is it possible that
    > characters with code points outside the US-ASCII range may be
    > interpreted by the browser in a similar way to < & and > in the
    > US-ASCII range, thereby allowing for an XSS attack?

    May be the NEXT LINE (U+0085) character, in C1 controls, part of all ISO 8859 charsets (for MIME) at position 0x85,
    which is valid as a line separator or as a blank in HTML?
    You may want to replace it with CRLF sequences, or you may want to uniformize the various encodings of newlines (CR
    not followed by LF, CR+LF, LF not following CR, NL) into a single one (such as LF, for compatibility with C language
    standard I/O) on input (and generate CR+LF on output).

    This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 20:51:15 CST