Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character?

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Thu, 7 May 2015 12:59:54 -0700

I assume that the JSON spec deliberately allows anything that Java and
JavaScript allow. In particular, there is no requirement for a Java String
or JavaScript string to contain "text", or well-formed UTF-16, or only
assigned characters. Some code stores binary data (sequence of arbitrary
16-bit unsigned integers) in a "string", just because it is easy and fairly
efficient to transport.

You should "validate" *text* only when you are certain that it is indeed
text. And when you do validate, you might want to be narrower than
"assigned character"; for example, you might require Unicode identifiers or
XML NMTOKENS or whatever. Also remember that "assigned" and "identifier"
and such depend on the version of Unicode your library currently implements.

markus
Received on Thu May 07 2015 - 15:02:22 CDT

This archive was generated by hypermail 2.2.0 : Thu May 07 2015 - 15:02:22 CDT