Re: UTF-8 (was:Unicode 4.0 BETA available for review)

From: Asmus Freytag (
Date: Wed Feb 26 2003 - 21:18:05 EST

  • Next message: John Hudson: "Announcement: new font technology association to be formed"

    Can we retitle this thread?

    I'm getting actual replies to my posting of the BETA that I need to keep
    track of, and the run-on discussion of UTF-8 under this title is distracting.

    Thanks for your help,

    At 04:56 PM 2/26/03 -0800, you wrote:
    >Yung-Fong Tang wrote:
    >>I see a hole here. How about UTF-8 representing a paired of surrogate
    >>code point with two 3 octets sequence instead of an one octets UTF-8
    >>sequence? It should be ill-formed since it is non-shortest form also,
    >>right? But we really need to watch out the language used there so we
    >>won't create new problem. I DO NOT want people think one 3 otects of
    >>UTF-8 surrogate low or high is ill-formed but one 3 octets of UTF-8
    >>surrogate high followed by a one 3 octets of UTF-8 surrogate low is legal.
    >How would you infer that a pair of any ill-formed sequences is not also
    >ill-formed, without any specific text allowing such?
    >Remember also that such pairs of 3-byte surrogate sequences were forbidden
    >at the same time CESU-8 was created.
    >Opinions expressed here may not reflect my company's positions unless
    >otherwise noted.

    This archive was generated by hypermail 2.1.5 : Wed Feb 26 2003 - 21:31:12 EST