re:Multi-Lingual Project Gutenberg (was: Unicode plain text)

From: John Fieber (jfieber@indiana.edu)
Date: Tue May 27 1997 - 16:03:50 EDT


On Tue, 27 May 1997, Marion Gunn <mgunn@egt.ie>

> The general consensus there was that
> incipient XML was being very heavily pushed as an alternative to html by
> SUN and MICROSOFT in collaboration

Sun has been actively involved in the development of XML, so
their position is no surprise. Lately Microsoft has been jumping
on the "standards" bandwagon (witness the ditching of WINS for
DNS, adoption of Kerberos, etc.) and a move to XML in particular
represents taking a distinctly different direction than Netscape,
whose founder has publicly stated that SGML is stupid--a position
I firmly believe will only hasten Netscape's death if it
persists.

> (as an alternative which would eliminate
> markup language altogether from the actual text to be transferred).

This is nonsensical.

In the world of HTML, you have a fixed set of tags you can use in
your documents, and you must assume that the browser knows how to
do something sensible with them (not always safe). With XML, or
SGML for that matter, your document gets marked up using tags
appropriate for the data being marked up. The document gets sent
to the browser along with a style sheet so that the browser can
do something sensible when it encounters the markup. This allows
for (a) more concise and precise markup of the document and (b)
more precise control over the ultimate rendering by the browser.

The push for XML represents a "back to the roots" movement. The
basic premise of SGML is that it is impossible to define a markup
language that is both general and precise. Thus, SGML is a
meta-language; a language for defining markup languages. At a
technical level, SGML standardizes parsing--how to distinguish
markup from data. HTML is just a single markup language defined
in terms of SGML. However, the promotion of HTML as a universal
exchange format is fundamentally at odds with the spirit of SGML.

A problem with using SGML in a web environment is the complexity
of the software required to implement the parsing rules. Enter
XML. XML basically does away with numerous non-essential
features of SGML that complicate parsing, things like tag
omission and minimization, shortrefs and the like. XML also
raises the compliance bar on character encoding from 7 bit ASCII
to Unicode.

-john



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT