Re: Unicode 4.0 BETA available for review

From: Markus Scherer (
Date: Wed Feb 26 2003 - 19:56:43 EST

  • Next message: Asmus Freytag: "Re: UTF-8 (was:Unicode 4.0 BETA available for review)"

    Yung-Fong Tang wrote:
    > I see a hole here. How about UTF-8 representing a paired of surrogate
    > code point with two 3 octets sequence instead of an one octets UTF-8
    > sequence? It should be ill-formed since it is non-shortest form also,
    > right? But we really need to watch out the language used there so we
    > won't create new problem. I DO NOT want people think one 3 otects of
    > UTF-8 surrogate low or high is ill-formed but one 3 octets of UTF-8
    > surrogate high followed by a one 3 octets of UTF-8 surrogate low is legal.

    How would you infer that a pair of any ill-formed sequences is not also ill-formed, without any
    specific text allowing such?

    Remember also that such pairs of 3-byte surrogate sequences were forbidden at the same time CESU-8
    was created.


    Opinions expressed here may not reflect my company's positions unless otherwise noted.

    This archive was generated by hypermail 2.1.5 : Wed Feb 26 2003 - 20:31:35 EST