Date: Mon Jun 26 2006 - 16:40:33 CDT

    > > One essential detail being that UTF-16 surrogates are excluded
    > > from the valid Unicode codepoints, while UTF-8 "surrogates"
    > > have binary values that are also valid Unicode codepoints.
    > I almost added that but held back because it seemed to me that that's
    > not really a difference in these encoding forms but rather is just a
    > fact about the coded character set. But then, IIRC UTF-16 is not able to
    > represent code points U+D800..U+DFFF while UTF-8 is.

    Nope. Neither can.

    0xD800 is ill-formed in UTF-16.

    0xED 0xA0 0x80 is ill-formed in UTF-8.

    For that matter, 0x0000D800 is ill-formed in UTF-32.

    Look it up.

    Now, anybody could put those values into a Unicode string
    and claim to be representing U+D800, but as a famous
    former president said, they "would be wrong." *hehe*


