Re: Applying Postel's Law to XML, from a Unicode perspective?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Sun, 28 Jun 2015 15:26:22 +0200

For XML there's in fact no problem at all: XML (but also JSON) requires for
its validity a single root element.
If there's a BOM followed by another element, it is not a conforming XML
document if that BOM is interpreted as part of a text element.
If there's a BOM followed by an XML declaration, it cannot be a text
element (the XML declaration must come before any other element).
The only possiblity of ambiguity is an XML document that consists only in a
single text element (possibly embedding comments) and no other element and
no XML declaration. Such document is purely plain-text in fact (with the
only exception of the predefined named or numeric character entities
starting by "&" and terminated by ";".

In summary, there's no problem at all for XML, (or JSON, or other
text-encoded syntaxes including javascript, where a leading ZWNBSP cannot
be valid in its syntax).

The theoretical ambiguity only exists with (unstructured) plain text (that
have no defined syntax to restrict their validity), and for that plain
texts should include a MIME document type in its transport headers to
define the behavior of the BOM. And if possible if there's a leading ZWNBSP
starting this text, it should be doubled to make sure it will be
interpreted correctly, as part of the transport layer. But in practice,
unstructured plain text documents never need to start with ZWNBSP (the only
exception being in short individual plain text database fields, which are
still rarely needed without a container (this includes CSV files where
texts fields should be surrounded by quotation marks, or start with a
leading row defining names of columns that never need an y leading ZWNBSP).
Being liberal does not really introduces a security issue, including for
digitally signed texts (signed plain texts also have other requirements
related to the interpretation of loine breaks and whitespaces: the simple
fix is to start this text by an empty line., and linebreaks and whitespaces
are collapsed to a single space prior to computing the diginatl signature
(hash / digest).

2015-06-28 14:31 GMT+02:00 Costello, Roger L. <costello_at_mitre.org>:

> Hi Folks,
>
> Postel's Law says:
>
> Be liberal in what you accept, and
> conservative in what you send.
>
> How might Postel's Law be applied to web services that receive XML and
> sends out XML?
>
> Here's one idea: a web service is willing to receive UTF-8 XML documents
> containing a pseudo-BOM; the web service sends out UTF-8 XML documents
> without the pseudo-BOM.
>
> Can you think of Unicode errors in inbound XML documents that a web
> service might be willing to accept?
>
> /Roger
>
>
Received on Sun Jun 28 2015 - 08:27:50 CDT

This archive was generated by hypermail 2.2.0 : Sun Jun 28 2015 - 08:27:51 CDT