Re: Unicode 4.0 BETA available for review

From: Kenneth Whistler (
Date: Thu Feb 27 2003 - 15:42:43 EST

  • Next message: Yung-Fong Tang: "Re: UTF-8 Error Handling (was: Re: Unicode 4.0 BETA available for review)"

    Frank Tang asked:

    > >> This discussion has been centered around UTF-8. But I hope the
    > >>corresponding rules apply to UTF-16 and UTF-32 for Unicode 4.0:
    > >>
    > >>. for UTF-32: occurrences of 'surrogates' are ill-formed.
    > >>
    > >>
    > >>
    > How about UTF-32 sequence which the 4 bytes represent value > U+10FFFF ?
    > Are they considered ill-formed? Should they?

    Yes, they are ill-formed.

    Since all the encoding forms are based on the Unicode scalar values,
    and since the Unicode scalar values are *defined* to be the
    range 0x0000..0xD7FF, 0xE000..0x10FFFF, any attempt to represent
    a code point higher than U+10FFFF in *any* encoding form is

    This will be called out explicitly in the Unicode 4.0 text, in
    case anyone still has the question:

    " * Any UTF-32 code unit greater than 0010FFFF<sub>16</sub> is
    I can keep answering these questions, but I can also assure
    everyone that the UTC worked *very* hard this time around to
    make the character encoding model much clearer in the Unicode 4.0
    text, and to anticipate all these edge cases.


    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 16:27:20 EST