re:Multi-Lingual Project Gutenberg (was: Unicode plain text)

From: John Fieber (
Date: Wed May 28 1997 - 09:34:16 EDT

On Tue, 27 May 1997, Unicode Discussion wrote:

> In message "re:Multi-Lingual Project Gutenberg (was: Unicode plain text)",
> '' writes:
> > I was simply making the observation that swearing off high level
> > protocols because they are messy now seems very out of character
> > with the spirit of Unicode.
> I don't see them as messy, just as short-lived. I don't perceive
> HTML as messy, quite the opposite (notwithstanding frequent abuse by
> authors such as using <Hn> tags to get bold/bigger), but I don't expect
> to still use it in 30 years.


> I don't know SGML, but let's try the exercise with an HTML page I
> wrote (chosen randomly amongst the ones I can show outside):

Let me try and clear up some basic (but common) misunderstandings
about SGML, HTML and the relationship between the two. SGML is a
metalanguage used to create markup languages, and it lays down
parsing rules that software uses to distinguish markup from data.
HTML is a *specific* markup language defined using SGML.

If you write software that understands SGML, by definition it
understands HTML or any other markup language you may choose to
define. Unfortunatly, most HTML software today is hardwired and
will become obsolete when HTML falls out of fashion. Not
surprisingly, it is generally produced by people who don't
understand what SGML is about.

SGML, which began in the 1960s, was formalized as an ISO standard
in 1986, is quite stable, and is showing absolutely no signs of
fading anytime soon. HTML as a sepcific markup language will
probably be out of fashion in 30 years (or 3...). Tools that
only know HTML too will become obsolete, but any *SGML* tool in
the future will be able to make sense of it as readily as the
tools of today do. I would argue that SGML is at least as stable
as the character encoding standards used to create SGML
documents, and in some casese even more so.

SGML was designed specifically to address the "here today, gone
tomorrow" problem of markup languages. A point *completely*
missed by most people whose only encounter with it is through
HTML. :(

> Same with tags stripped (almost illegible: headings, bullets gone)

Then try this trivial exercise: replace <li> with a bullet
instead of just stripping it. Center headlines if you like. It
isn't rocket science. Find and replace can handle most of it. A
few macros could be used if you want to get fancy.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT