RE: HTML5 encodings (was: Re: BOCU patent)

From: Chris Weber (chris@casabasecurity.com)
Date: Mon Dec 21 2009 - 12:10:02 CST

  • Next message: Charlie Ruland ☘: "Re: Is there a Japanese character for the word Unicode? (from Re: Unicode Haiku Contest)"

    Disagree with this statement. It can be true that security is related to an attacker's ability to influence the auto-discovery of an encoding, but security isn't limited to that scenario.

    " The security issue is largely a red herring. Security of HTML encodings
    is related to incorrect auto-discovery of encodings, not to using
    encodings that have been properly announced."

    In the world of Web-apps, most encoding-related security vulnerabilities and exploits come from an attacker's ability to control the charset emitted by the page. In other words, an attacker injects some persistent UTF-7 encoded payload, and then manages to solicit a victim to visit the page where the attacker's payload will render AND the attacker can set the META or HTTP header charset to utf-7. In this case, the browser isn't auto-discovering, it sees UTF-7 as a valid declaration, and the Web-app is blind, just delivering data.

    -Chris

    -----Original Message-----
    From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of Doug Ewell
    Sent: Monday, December 21, 2009 6:38 AM
    To: Unicode Mailing List
    Cc: Peter Krefting
    Subject: Re: HTML5 encodings (was: Re: BOCU patent)

    Peter Krefting <peter at opera dot com> wrote:

    >> "User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
    >> encodings."
    >>
    >> Amazing, isn't it? So thoughtful of the HTML 5 WG to protect
    >> developers' time by prohibiting a handful of selected encodings.
    >
    > There are some security issues related to these, and they are very
    > rarely used on actual web pages, which is why they are on the
    > "prohibited" list. Full reasoning behind it can probably be found on
    > the HTML5 mailing list, although I do not have a link to share. One of
    > the problems is that they are not ASCII based, and theoretically
    > something like "<script>" can be encoded in such a way that a naïve
    > ASCII-based parser wouldn't find it and filter it away from
    > user-submitted input, making it easier to do cross-domain attacks.

    SCSU is completely ASCII-based, as long as the text is in single-byte
    mode, which would be the case for the entire HTML header, and usually
    the entire text when encoding small alphabets. In "Unicode mode," SCSU
    is essentially UTF-16BE (with a non-ASCII escape for some private-use
    characters), and UTF-16BE is not prohibited.

    The security issue is largely a red herring. Security of HTML encodings
    is related to incorrect auto-discovery of encodings, not to using
    encodings that have been properly announced. Even UTF-7, while
    generally undesirable and unnecessary for Web pages, is "secure" if
    correctly identified.

    Henri Sivonen stated that the main reason for prohibiting encodings was
    to avoid "wasting developer time" and focusing attention on support of
    new features instead. Apparently he didn't feel developers were capable
    of both.

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Dec 21 2009 - 12:12:27 CST