Re: Unicode 4.0 BETA available for review

From: Yung-Fong Tang (
Date: Thu Feb 27 2003 - 14:53:10 EST

  • Next message: Yung-Fong Tang: "Re: Unicode 4.0 BETA available for review"

    Stefan Persson wrote:

    > Kenneth Whistler wrote:
    >> Unicode 3.0 defined non-shorted UTF-8 as *irregular* code value
    >> sequences. There were two types:
    >> a. 0xC0 0x80 for U+0000 (instead of 0x00)
    >> b. 0xED 0xA0 0x80 0xED 0xB0 0x80 for U+10000 (instead of 0xF0 0x90
    >> 0x80 0x80)
    > Ah, but encoding NULL as a surrogate character and then encoding those
    > two surrogates as three bytes, making totally 6 bytes a character,
    > would also be technically possible (though not legal), right?

    How ? Surrogate pairs can only be used to represent U+10000 - U+10FFFF .
    It is IMPOSSIBLE to use Surrogate pair to represent any characters in
    the range of U+0000 - U+FFFF, including U+0000 which is NULL.

    > Stefan
    > _____________________________________________________
    > Gå före i kön och få din sajt värderad på nolltid med Yahoo! Express
    > Se mer på:

    This archive was generated by hypermail 2.1.5 : Thu Feb 27 2003 - 15:38:40 EST