Re: The "prohibited" encodings...

From: Doug Ewell (
Date: Wed Dec 30 2009 - 08:30:07 CST

  • Next message: Phillips, Addison: "RE: HTML5 encodings (was: Re: BOCU patent)"

    "Phillips, Addison" <addison at amazon dot com> wrote:

    > UTF-7, BOCU, and SCSU are banned either because they auto-detect as
    > something other than themselves or because an otherwise "innocuous"
    > byte sequence detects as being one of them, thus serving as the basis
    > for an XSS attack.

    What does SCSU auto-detect as? In an HTML or XML environment, where the
    stream starts with a Basic Latin run, SCSU should look like Latin-1
    eventually followed by a single-byte mode tag, a C0 control character
    that is not NUL, HT, CR, FF, or LF. (If there is no SCSU tag, then the
    text *is* Latin-1 except that the single-byte tags are prefixed with

    An initial run of ASCII followed by, say, 0x12 ought to be a reliable
    sign of SCSU, unless you have reason to suspect VISCII. The only time
    this would fail is if the encoder author decided to be a smart-aleck and
    switch into Unicode mode to encode initial ASCII.

    BOCU-1, on the other hand, auto-detects as Latin-1 gibberish.

    Doug Ewell  |  Thornton, Colorado, USA  |
    RFC 5645, 4645, UTN #14  |  ietf-languages @ ­

    This archive was generated by hypermail 2.1.5 : Wed Dec 30 2009 - 08:32:20 CST