Re: In UTF-16 no codepoints are assigned to D800 - DFFF ... is that range also reserved in UTF-8 and UTF-32?

From: Martinho Fernandes <martinho.fernandes_at_gmail.com>
Date: Fri, 25 Jan 2013 15:41:22 +0100

On Fri, Jan 25, 2013 at 3:15 PM, Costello, Roger L. <costello_at_mitre.org> wrote:
> I learned that the range from D800 to DFFF is reserved because it is used to create variable-length UTF-16 strings.
>
> Thus, there are no codepoints assigned to the range D800 to DFFF in UTF-16.
>
> Does that mean there are no codepoints assigned to the range D800 to DFFF in UTF-8 and UTF-32? I assume that's the case, but just want to check to be sure.
>

Code points are assigned in the Unicode code point space, not in the encodings.

All the UTF encodings share the same codepoint space. Because
D800-DFFF cannot be encoded in UTF-16, those codepoints are reserved
and will not have characters assigned to them.

Each encoding has rules for encoding each code point value. The rules
for UTF-8 and UTF-32 *could* be extended to encode the values in
D800-DFFF, but those values do not appear unless something goes wrong,
because there are no characters assigned to them. So the Unicode
standard says that these values cannot be encoded in UTF-8 and UTF-32.

Mit freundlichen Grüßen,

Martinho
Received on Fri Jan 25 2013 - 08:42:18 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 25 2013 - 08:42:18 CST