    "Phillips, Addison" <addison at amazon dot com> wrote:

    > UTF-7, BOCU, and SCSU are banned either because they auto-detect as
    > something other than themselves or because an otherwise "innocuous"
    > byte sequence detects as being one of them, thus serving as the basis
    > for an XSS attack.

    What does SCSU auto-detect as? In an HTML or XML environment, where the
    stream starts with a Basic Latin run, SCSU should look like Latin-1
    eventually followed by a single-byte mode tag, a C0 control character
    that is not NUL, HT, CR, FF, or LF. (If there is no SCSU tag, then the
    text *is* Latin-1 except that the single-byte tags are prefixed with

    An initial run of ASCII followed by, say, 0x12 ought to be a reliable
    sign of SCSU, unless you have reason to suspect VISCII. The only time
    this would fail is if the encoder author decided to be a smart-aleck and
    switch into Unicode mode to encode initial ASCII.

    BOCU-1, on the other hand, auto-detects as Latin-1 gibberish.

