From: Phillips, Addison
Date: Fri Jul 04 2008 - 10:31:43 CDT

    See Section 3.8 in the standard:

    In my experience, it is a lot clearer to folks if you do not refer to surrogate code points as anything other than reserved. UTF-16 uses code units to encode Unicode code points.

    Formally, the code points in Unicode run from 0 through 0x10FFFF, so the surrogate code points are code points. However the code points between D800 and DFFF are reserved and do not encode characters. Section 3.9 says:

    "Each encoding form maps the Unicode code points U+0000..U+D7FF and
    U+E000..U+10FFFF to unique code unit sequences."

    So, the surrogate pair (of code units) encodes a code point (U+20045 in your example).


    > OK, and when you have them together in a surrogate pair, do you
    > call it a
    > pair of code units or can you also call them a pair of code points?
