Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM

From: Doug Ewell (dewell@adelphia.net)
Date: Sat Feb 03 2007 - 18:14:54 CST

Next message: Philippe Verdy: "Re: New translation posted"

Previous message: Jukka K. Korpela: "Re: New translation posted"
In reply to: Philippe Verdy: "UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Next in thread: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Reply: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

> An initial U+FEFF is encoded in BOCU-1 with the three bytes FB EE 28.
>
> This is correct if one sees the leading BOM as if it was encoding a
> significant codepoint which is part of the text, and then encoded with
> the normal BOCU-1 algorithm; however this paragraph does not state the
> effect of this encoded sequence on the current state of the encoder
> and of the decoder.
>
> This may cause a difference when interpreting the next bytes after
> this BOM, if this is not an ASCII byte, because the initial state is
> normally prev=0x40; but according to the BOCU-1 profile, this sequence
> should change the state to prev=0xFEC0 (according to rule R5
> Adjustment: "d. Otherwise, set prev to the middle of a 128-block:
> prev=(c&0x7F)+40.

I covered all of this three years ago in Unicode Technical Note #14.
Look for the paragraph in the BOCU-1 section that begins "Because each
character..."

It's possible to encode a signature safely in BOCU-1 by following it
with an FF reset byte, as you and Frank observed, but the spec
discourages FF resets.

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://users.adelphia.net/~dewell/
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages

Next message: Philippe Verdy: "Re: New translation posted"
Previous message: Jukka K. Korpela: "Re: New translation posted"
In reply to: Philippe Verdy: "UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Next in thread: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Reply: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 03 2007 - 18:17:17 CST