From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sun Feb 04 2007 - 14:55:54 CST
From: "Doug Ewell" <dewell@adelphia.net>
> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>> The main risk is caused by the ambiguity of the sentence which does
>> not indicate that it really encodes the codepoint U+FEFF normally
>> (i.e. it changes the current state), and that does not specify if the
>> leading BOM is required or optional.
>
> I'm sure it would not be difficult to edit Section 2.5 to explain this,
> something like:
>
> "An initial U+FEFF is encoded in BOCU-1 with the three bytes FB EE 28.
> Note that adding or stripping an initial U+FEFF generally requires the
> next code point above U+0020 to be re-encoded."
... unless there's a C0 control character (below U+0020) before such codepoint (above U+0020) occurs. There's no reencoding if the first non-SPACE character after the leading bom is a control like a end-of-line sequence or a tabulation, or if it's a character in the U+FE80..U+FEFF range.
This archive was generated by hypermail 2.1.5 : Sun Feb 04 2007 - 14:58:01 CST