From: John Cowan (jcowan@reutershealth.com)
Date: Fri Dec 10 2004 - 21:22:11 CST
Philippe Verdy scripsit:
> And I disagree with you about the fact the U+0000 can't be used in XML
> documents. It can be used in URI through URI escaping mechanism, as
> explicitly indicated in the XML specification...
You have a hold of the right stick but at the wrong end. U+0000 can be
encoded in a URI as %00, but that does not mean that the IRIs in system ids
and namespace names (and potentially other places) can contain explicit
U+0000 characters or � escapes either. Both of those are illegal,
and documents that contain them are not well-formed.
In character content and attribute values, U+0000 is not possible.
> And the fact that the various character productions, that are normally
> normative, have been changed so often, sometimes through erratas that
> were forgotten in the text of the next edition of the standard,
Do you have evidence for this claim?
> The only thing about which I can agree is that XML will forbid surrogates
> and U+FFFE and U+FFFF, but I won't say that a XML parser that does not
> reject NULs or other non-characters or "disallowed" C0 controls is so
> much buggy.
You are of course entitled to your uninformed opinion.
> But all these is also a proof that XML documents are definitely NOT
> plain-text documents, so you can't use Unicode encoding rules at the
> encoded XML document level, only at the finest plain-text nodes (these
> are the levels that the productions in the XML standard are trying, with
> more or less success, to standardize).
You can't blindly do *normalization* of XML documents as if they were
plain text. *Encoding* XML documents according to Unicode is of course
possible and desirable.
> As a consequence any process that blindly applies a plain-text
> normalization to a complete XML document is bogous, because it breaks the
> most basic XML conformance, i.e. the core document structure...
In one extraordinarily unlikely case, yes: the appearance of a
combining overlay slash following the ">" that closes a tag will
damage the document if it is NFC-normalized.
-- You are a child of the universe no less John Cowan than the trees and all other acyclic http://www.reutershealth.com graphs; you have a right to be here. http://www.ccil.org/~cowan --DeXiderata by Sean McGrath jcowan@reutershealth.com
This archive was generated by hypermail 2.1.5 : Fri Dec 10 2004 - 21:23:09 CST