Re: HTML5 encodings (was: Re: BOCU patent)

From: Doug Ewell (doug@ewellic.org)
Date: Wed Dec 23 2009 - 00:26:08 CST

  • Next message: André Szabolcs Szelp: "Re: HTML5 encodings (was: Re: BOCU patent)"

    Peter Krefting <peter at opera dot com> wrote:

    >> SCSU is completely ASCII-based, as long as the text is in single-byte
    >> mode, which would be the case for the entire HTML header, and usually
    >> the entire text when encoding small alphabets.
    >
    > True, but IIRC you can also encode the ASCII characters using other
    > methods, and still have parts in ASCII, meaning that you could put
    > some SCSU inside an ASCII document and have the whole document as
    > valid SCSU. This could be a security risk if the container document
    > did not declare its encoding (think comments to a blog post here).

    There aren't that many realistic ways to encode ASCII characters in
    SCSU. You could encode the ASCII bytes directly, or you could switch to
    Unicode mode and encode them as UTF-16BE. There are other
    possibilities, including gratuitous SDn tags with arguments in the ASCII
    range, but nothing that opens the door for attacks any more than an East
    Asian DBCS.

    Remember that ISO 8859-1 with no control characters besides NUL, HT, CR,
    and LF is also valid SCSU.

    > Well, here at Opera we had to disable support for two encodings (UTF-7
    > and UTF-32) to become HTML5 conformant, if that isn't a waste of
    > developer time, I don't know what is :-)

    Agreed. It would have been understandable to say that authors of HTML 5
    documents SHOULD NOT use such and so encodings, and browsers SHOULD NOT
    recognize them in the absence of an explicit encoding declaration, but
    to forbid them under all circumstances is unnecessary.

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Wed Dec 23 2009 - 00:29:00 CST