Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character? from Norbert Lindenberg on 2015-05-09 (Unicode Mail List Archive)

From: Norbert Lindenberg <unicode_at_lindenbergsoftware.com>
Date: Fri, 8 May 2015 23:26:56 -0700

RFC 7158 section 7 [1] provides not only the \uXXXX notation for Unicode code points in the Basic Multilingual Plane, but also a 12-character sequence encoding the UTF-16 surrogate pair (i.e. \uYYYY\uZZZZ with 0xD800 ≤ YYYY < 0xDC00 ≤ ZZZZ ≤ 0xDFFF) for supplementary Unicode code points. A tool checking for escape sequences that don’t correspond to any Unicode character must be aware of this, because neither \uYYYY nor \uZZZZ by itself would correspond to any Unicode character, but their combination may well do so.

Norbert

[1] https://tools.ietf.org/html/rfc7158#section-7

> On May 7, 2015, at 5:46 , Costello, Roger L. <costello_at_mitre.org> wrote:
>
> Hi Folks,
>
> The JSON specification says that a character may be escaped using this notation: \uXXXX (XXXX are four hex digits)
>
> However, not every four hex digits corresponds to a Unicode character.
>
> Are there tools to scan a JSON document to detect the presence of \uXXXX, where XXXX does not correspond to any Unicode character?
>
> /Roger
>
Received on Sat May 09 2015 - 11:44:34 CDT

This archive was generated by hypermail 2.2.0 : Sat May 09 2015 - 11:44:34 CDT