Re: Surrogates and noncharacters

From: Hans Aberg <haberg-1_at_telia.com>
Date: Sun, 10 May 2015 20:35:41 +0200

> On 10 May 2015, at 12:23, Richard Wordingham <richard.wordingham_at_ntlworld.com> wrote:

>> However I wonder what would be the effect of D80 in UTF-32: is
>> <0xFFFFFFFF> a valid "32-bit string" ?
>
> The value 0xFFFFFFFF cannot appear in a UTF-32 string. Therefore it
> cannot represent a unit of encoded text in a UTF-32 string.

Even though the values with highest bit set are not a part of original UTF-32, it can easily be extended also to original UTF-8, which may be simpler to implement.
Received on Sun May 10 2015 - 13:37:24 CDT

This archive was generated by hypermail 2.2.0 : Sun May 10 2015 - 13:37:26 CDT