Re: Plain Text

From: John Cowan (cowan@locke.ccil.org)
Date: Tue Jul 06 1999 - 10:28:35 EDT


Edward Cherlin wrote:

> This is the key point for me. You acknowledge the need for flavors of text
> other than your preformatted plain text. I thought you were holding out for
> one flavor only.

Indeed, but "preformatted plain text" has traditionally been called
"plain text", or in MIME "text/plain", and this terminology ought not
to be revised unwarrantedly. Other species of plain text should have
a distinguishing adjective.

> Now we can discuss the flavors, such as delimited database
> interchange files with lines of arbitrary length.

We can, but I think we would do well to nail down preformatted
plain text (aka "plain text") first, as it is the most
stable.

> Presumably we can define
> them using some of the apparatus that is becoming available in XML or as
> MIME data types. Would it make sense, then, to create a formal XML
> definition of plain text files, with a leading BOM, no interpretations for
> any tags, the minimum set of control characters, and the appropriate set of
> transformation formats?

No, at least for the XML part. (You could create a full-SGML
definition, but I question the purpose of it, except perhaps
to help in defining a Unicode-preformatted-plain-text grove model.)

XML compels special interpretations for "<" and "&"
and requires matching enclosing tags; preformatted plain text
has no such requirements.

> That would get around my earlier objection, about
> how to make an implementation available on all platforms. What about
> corresponding MIME types?

The corresponding MIME type is "text/plain; charset=utf-8" or
"... utf-16".

Anything else should have a different MIME type or at least
different parameters.
 
> Preformatted or reflowable.

I have not seen ones that are not preformatted.
 
> > . Traditional (not "legacy") email and netnews.
>
> There is presently no way to specify preformatted or reflowable.

There is a widespread presumption for preformatted, although
sometimes the formatting is done by the creating software, not the user,
alas. Rendering software usually has at least an option to
display as-is.
 
> To summarize your answer to my objections, we are defining a new format
> independent of previous conventions, in which we can specify usage of the
> minimal set of formatting characters regardless of usage in text files of
> 7-bit ASCII and 8-bit character sets of any kind,

Yes.

> while allowing for a few
> variant flavors of text, such as preformatted, reflowable, and database.

And of these, preformatted is the most important and stable, and
should be specified first. The others can be specified ad libitum
later.

-- 
John Cowan	http://www.ccil.org/~cowan		cowan@ccil.org
   Schlingt dreifach einen Kreis um dies! / Schliesst euer Aug vor heiliger Schau,
   Denn er genoss vom Honig-Tau / Und trank die Milch vom Paradies.
			-- Coleridge / Politzer



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT