Re: HTML5 encodings (was: Re: BOCU patent)

From: Doug Ewell (doug@ewellic.org)
Date: Wed Dec 23 2009 - 00:26:08 CST

Next message: André Szabolcs Szelp: "Re: HTML5 encodings (was: Re: BOCU patent)"

Previous message: Doug Ewell: "Re: HTML5 encodings (was: Re: BOCU patent)"
In reply to: Peter Krefting: "Re: HTML5 encodings (was: Re: BOCU patent)"
Next in thread: André Szabolcs Szelp: "Re: HTML5 encodings (was: Re: BOCU patent)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Krefting <peter at opera dot com> wrote:

>> SCSU is completely ASCII-based, as long as the text is in single-byte
>> mode, which would be the case for the entire HTML header, and usually
>> the entire text when encoding small alphabets.
>
> True, but IIRC you can also encode the ASCII characters using other
> methods, and still have parts in ASCII, meaning that you could put
> some SCSU inside an ASCII document and have the whole document as
> valid SCSU. This could be a security risk if the container document
> did not declare its encoding (think comments to a blog post here).

There aren't that many realistic ways to encode ASCII characters in
SCSU. You could encode the ASCII bytes directly, or you could switch to
Unicode mode and encode them as UTF-16BE. There are other
possibilities, including gratuitous SDn tags with arguments in the ASCII
range, but nothing that opens the door for attacks any more than an East
Asian DBCS.

Remember that ISO 8859-1 with no control characters besides NUL, HT, CR,
and LF is also valid SCSU.

> Well, here at Opera we had to disable support for two encodings (UTF-7
> and UTF-32) to become HTML5 conformant, if that isn't a waste of
> developer time, I don't know what is :-)

Agreed. It would have been understandable to say that authors of HTML 5
documents SHOULD NOT use such and so encodings, and browsers SHOULD NOT
recognize them in the absence of an explicit encoding declaration, but
to forbid them under all circumstances is unnecessary.

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s

Next message: André Szabolcs Szelp: "Re: HTML5 encodings (was: Re: BOCU patent)"
Previous message: Doug Ewell: "Re: HTML5 encodings (was: Re: BOCU patent)"
In reply to: Peter Krefting: "Re: HTML5 encodings (was: Re: BOCU patent)"
Next in thread: André Szabolcs Szelp: "Re: HTML5 encodings (was: Re: BOCU patent)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 23 2009 - 00:29:00 CST