Re: Applying Postel's Law to XML, from a Unicode perspective? from Daniel Bünzli on 2015-06-28 (Unicode Mail List Archive)

From: Daniel Bünzli <daniel.buenzli_at_erratique.ch>
Date: Sun, 28 Jun 2015 14:25:24 +0100

Le dimanche, 28 juin 2015 à 13:31, Costello, Roger L. a écrit :
> Can you think of Unicode errors in inbound XML documents that a web service might be willing to accept?

It depends a bit on your use case and setting (e.g. on the web, security may need to be taken into account), but one thing that could be done is to not have hard failures on character stream decoding errors but simply notify the user of the problem and continue by replacing the offending bytes by the Unicode replacement character U+FFFD until you manage to resynchronize the UTF-{8,16} byte stream and see if you manage to still get the parsing done.

In practice such semi-broken XML documents can be produced by the export procedures of legacy software which fail to correctly encode some of the more special characters they have in another legacy encoding. It's better to eventually correct these documents and as such this should not be done *silently*, but it's nicer to the user if your import procedures are "best-effort" and can recover from these kinds of error conditions.

Best,

Daniel
Received on Sun Jun 28 2015 - 08:26:58 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 28 2015 - 08:26:58 CDT