Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Sat Feb 03 2007 - 07:48:31 CST

  • Next message: Philippe Verdy: "Re: UTS#40 (BOCU-1) ambiguity and possible serious bug about leading BOM"

    Note that because of this ambiguity, it should be recommanded to encode the special RESET byte (FF) after the leading byte, i.e. using a leading BOM as: FB EE 28 FF
    This way, the new state after BOM is reset to the initial value prev=0x0040!

    I also think that the sentence describing the 3 byte sequence should explicitly say that it effectively encodes the difference 0xFEFF-0x0040 = 0xFEBF using the base-243 encoding.

    Note that there's no provision elsewhere in the specification that indicates that U+FEFF resets the state (only the ASCII characters except SPACE, and the RESET code FF have this effect).



    This archive was generated by hypermail 2.1.5 : Sat Feb 03 2007 - 07:51:02 CST