Re: Ways to detect that XXXX in JSON \uXXXX does not correspond to a Unicode character? from Daniel Bünzli on 2015-05-08 (Unicode Mail List Archive)

From: Daniel Bünzli <daniel.buenzli_at_erratique.ch>
Date: Fri, 8 May 2015 14:32:51 +0200

Le vendredi, 8 mai 2015 à 13:48, Philippe Verdy a écrit :
> JSON came initially from Javascript, and it is used extensively with Javascript.

But not *only* for a long time now.

> The RFC is deviating from the currently running implementations.

Well did you test them all ? There's quite a big list here http://www.json.org. Taking a random one mentioned on that page leads me to http://golang.org/pkg/encoding/json/ in which they say that they replace invalid UTF-16 surrogate pairs by U+FFFD. This is really not very surprising since apparently go's strings as text are UTF-8 encoded so when you need to produce your results as UTF-8 then you don't have a lot of solutions... error and/or U+FFFD.

In any case deviating or not, that's for good since it would be insane to impose JavaScript's string as a data structure for an interchange format that intents to be universal and *textual*.

Best,

Daniel
Received on Fri May 08 2015 - 07:34:01 CDT

This archive was generated by hypermail 2.2.0 : Fri May 08 2015 - 07:34:01 CDT