Re: Questions on ZWNBS - for line initial holam plus alef

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Aug 12 2003 - 08:40:02 EDT

  • Next message: Philippe Verdy: "Re: Questions on ZWNBS - for line initial holam plus alef"

    From: "Jon Hanna" <jon@spin.ie>

    > Lots of different things happen that affect the whitespace of an XML
    > document (whether a DOM tree is constructed or not, since it isn't the
    only
    > legal way to process an XML document).

    Of course one is not required to build an actual DOM tree, however XML, HTML
    and alike is now defined in terms of the DOM, where the text/xml syntax is
    just a serialization, which is the only place where whitespaces
    normalization is defined (such normalization does not occur at the DOM
    level, and a XML document may be serialized with another concrete syntax
    than the one assigned to the "text/xml" MIME type, registered and documented
    by the W3C.

    When processing XML documents, the DOM part is the most important feature
    and it is logically separated from the concrete syntax used by text XML
    parsers. The W3C defines very strict rules to ensure that the DOM-equivalent
    data will be preserved, and whitespace normalization in XML documents
    serialized as "text/xml" is mandatory, or it is not a valid "text/xml"
    serialization.

    Processing a "text/xml" document in a way that would be incompatible with
    what a DOM tree builder would create is not conforming. If this is
    different, then it is not XML but a derived language (for example HTML or
    SGML which are using more "relaxed" syntaxes). In XML, whitespace
    normalization can be overriden using very precise rules within the parser
    only, but not in the resulting DOM-tree, so it is important to understand
    each step that goes from the concreate text/xml syntax to the DOM-tree or
    its equivalents (notably the successive steps required in parsed entities,
    named entities, ...) No XML application is required to use the "text/xml"
    MIME syntax, and there exists such examples (for example the serialization
    and compression formats used by WAP, MMS, Nec's i-Mode, and SOAP).

    If an application does not build the DOM tree, it is still required to
    perform namespace resolution and to solve named entities according to the
    standard "text/xml" MIME rules formulated by the W3C reference, including
    all its facets, needed for interoperability of document properties
    independantly of the character encoding used in the serialized document, or
    its syntaxic representation. In my opinion, all XML-based languages should
    be defined now in terms of its DOM structure, and the XML application should
    be defined by a valid DTD, or beter now with a now standard XSD schema, that
    can be processed by validating parsers (parsers that absolutely need to
    create a DOM-like tree or flow of tokens with strictly defined properties,
    value sets and behavior.)

    Without DOM interoperability, XML would be another imprecise language like
    HTML, with very little reusability due to naming conflicts. This is the most
    important benefit of XHTML (strictly based on XML) face to HTML (4.x and
    before) and SGML (all versions), notably when a schema is explicitly
    specified for the document, and is loaded for validating purposes (some
    schemas are normative like XHTML, and canot be changed by authors)



    This archive was generated by hypermail 2.1.5 : Tue Aug 12 2003 - 09:09:43 EDT