HTML5 encodings (was: Re: BOCU patent)

From: Peter Krefting (
Date: Mon Dec 21 2009 - 02:18:14 CST

  • Next message: William_J_G Overington: "Refereed journals on typography research"

    Doug Ewell <>:

    > "User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU
    > encodings."
    > Amazing, isn't it? So thoughtful of the HTML 5 WG to protect
    > developers' time by prohibiting a handful of selected encodings.

    There are some security issues related to these, and they are very rarely
    used on actual web pages, which is why they are on the "prohibited" list.
    Full reasoning behind it can probably be found on the HTML5 mailing list,
    although I do not have a link to share. One of the problems is that they
    are not ASCII based, and theoretically something like "<script>" can be
    encoded in such a way that a naïve ASCII-based parser wouldn't find it and
    filter it away from user-submitted input, making it easier to do
    cross-domain attacks.

    > I can support Fieldata or PTTC/EBCD in my user agent if I want to, but
    > not UTF-7 or SCSU.

    You can even support PETSCII if you wish to (or maybe not, as HTML is
    defined as Unicode data and PETSCII cannot be fully converted to Unicode),
    but neither of these do (AFAIK) pose the security risks that come with the
    prohibited ones.

    \\// Peter Krefting - all opinions here are my own, etc, etc

    This archive was generated by hypermail 2.1.5 : Mon Dec 21 2009 - 02:22:36 CST