From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 12 2003 - 08:40:02 EDT
From: "Jon Hanna" <jon@spin.ie>
> Lots of different things happen that affect the whitespace of an XML
> document (whether a DOM tree is constructed or not, since it isn't the
only
> legal way to process an XML document).
Of course one is not required to build an actual DOM tree, however XML, HTML
and alike is now defined in terms of the DOM, where the text/xml syntax is
just a serialization, which is the only place where whitespaces
normalization is defined (such normalization does not occur at the DOM
level, and a XML document may be serialized with another concrete syntax
than the one assigned to the "text/xml" MIME type, registered and documented
by the W3C.
When processing XML documents, the DOM part is the most important feature
and it is logically separated from the concrete syntax used by text XML
parsers. The W3C defines very strict rules to ensure that the DOM-equivalent
data will be preserved, and whitespace normalization in XML documents
serialized as "text/xml" is mandatory, or it is not a valid "text/xml"
serialization.
Processing a "text/xml" document in a way that would be incompatible with
what a DOM tree builder would create is not conforming. If this is
different, then it is not XML but a derived language (for example HTML or
SGML which are using more "relaxed" syntaxes). In XML, whitespace
normalization can be overriden using very precise rules within the parser
only, but not in the resulting DOM-tree, so it is important to understand
each step that goes from the concreate text/xml syntax to the DOM-tree or
its equivalents (notably the successive steps required in parsed entities,
named entities, ...) No XML application is required to use the "text/xml"
MIME syntax, and there exists such examples (for example the serialization
and compression formats used by WAP, MMS, Nec's i-Mode, and SOAP).
If an application does not build the DOM tree, it is still required to
perform namespace resolution and to solve named entities according to the
standard "text/xml" MIME rules formulated by the W3C reference, including
all its facets, needed for interoperability of document properties
independantly of the character encoding used in the serialized document, or
its syntaxic representation. In my opinion, all XML-based languages should
be defined now in terms of its DOM structure, and the XML application should
be defined by a valid DTD, or beter now with a now standard XSD schema, that
can be processed by validating parsers (parsers that absolutely need to
create a DOM-like tree or flow of tokens with strictly defined properties,
value sets and behavior.)
Without DOM interoperability, XML would be another imprecise language like
HTML, with very little reusability due to naming conflicts. This is the most
important benefit of XHTML (strictly based on XML) face to HTML (4.x and
before) and SGML (all versions), notably when a schema is explicitly
specified for the document, and is loaded for validating purposes (some
schemas are normative like XHTML, and canot be changed by authors)
This archive was generated by hypermail 2.1.5 : Tue Aug 12 2003 - 09:09:43 EDT