Re: HTML5 encodings (was: Re: BOCU patent)

From: Doug Ewell (doug@ewellic.org)
Date: Sun Dec 27 2009 - 21:49:35 CST

Next message: Doug Ewell: "Re: HTML5 encodings"

Previous message: Asmus Freytag: "Re: Filtering and displaying untrusted UTF-8"
In reply to: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Next in thread: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Reply: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Reply: Andrew West: "Re: HTML5 encodings (was: Re: BOCU patent)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Addressing only this one statement for now...

"verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

> Even UTF-32 does not even needs any BOM (because it is self-ordered by
> the position of the NUL byte).

This fails for any byte sequence { 00, xx, yy, 00 } where xx and yy are
both < 0x11. For example:

Ā U+0100 LATIN CAPITAL LETTER A WITH MACRON
in UTF-32BE: { 00 00 01 00 }
in UTF-32LE: { 00 01 00 00 }

𐀀 U+10000 LINEAR B SYLLABLE B008 A
in UTF-32BE: { 00 01 00 00 }
in UTF-32LE: { 00 00 01 00 }

Naturally you wouldn't have a whole string of these in real life, so the
heuristic would work. But that's what the BOM is for, so that you don't
have to rely on heuristics.

--
Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s

Next message: Doug Ewell: "Re: HTML5 encodings"
Previous message: Asmus Freytag: "Re: Filtering and displaying untrusted UTF-8"
In reply to: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Next in thread: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Reply: verdy_p: "Re: HTML5 encodings (was: Re: BOCU patent)"
Reply: Andrew West: "Re: HTML5 encodings (was: Re: BOCU patent)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Dec 27 2009 - 21:52:12 CST