Counting Codepoints

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 11 Oct 2015 22:20:34 +0100

Is the number of codepoints in a UTF-16 string well defined?

For example, which of the following two statements are true?

(a) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10020.

(b) The ill-formed three code-unit Unicode 16-bit string <0xDC00,
0xD800, 0xDC20> contains three codepoints, U+DC00, U+D800 and U+DC20.

Statement (a) is probably more useful, but I couldn't find anything to
rule that statement (b) is false.

Richard.
Received on Sun Oct 11 2015 - 16:21:57 CDT

This archive was generated by hypermail 2.2.0 : Sun Oct 11 2015 - 16:21:58 CDT