Re: Unicode plain text

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Mon May 26 1997 - 11:26:38 EDT


> We seem to have two different requirements for plain text here.
> Now my assumption was that we would mostly want to use one type, whereas
> there seems to be a strong demand for another.
> ...
> First the type I had assumed as the default.
> I would call this logical formatting.
>
> Paragraph Separator is most commonly used. Text usually runs on without
> any control characters until a new paragraph is needed. Since this
> is logical formatting the author does not know or care whether a
> paragraph is indicated by a completly blank line or a new line is
> started with an indent or some other convention.
>
I suppose this is, indeed, a form of plain text, but I would call it "input
for a text formatter", not text to be used and viewed on its own as it stands.
It is a degenerate case of a larger class, e.g. input for TeX, Scribe, Troff,
IPFC, SGML, or HTML (for text formatting). It is only in the last few years
that I began to receive "long-line" text in email, and I can only suppose that
it was generated by some sort of editor that does its own word wrapping during
input, but does not send the line breaks on the mistaken assumption that every
email client in the world is (or should be) also a text formatter.

[The second type of plain text...]

> The assumptions behind this explicit approach include:
> * The text will go straight to a printer that is not very bright.
> * The author knows exactly how many characters fit on a line. (Often
> there is also the assumption that each character is fixed width.)
> * The author knows exactly how many lines fit on a page.
> * The author knows in which sequence the characters in a line will
> be printed. (Usually assumes left to right without any reordering.)
>
Right -- this is the kind people have been using for more decades than many of
us have been alive. It does not deserve the bad rap. Of course we all find
it irritating when the composer of such text assumes wider or longer pages
than we have, but that is not a reason to abolish this, the most common form
of plain text -- in fact, it is all the more reason to set standards for its
use. "Standard lines are so wide; standard pages are so long", etc. Such
standards tend to be set of their own volution, e.g. among e-mail and netnews
users, where recipients of badly formatted messages tend to take it on
themselves to educate the senders as to common practice.

Ideally, preformatted plain text can also be fed into your favorite rendering
engine to produce the effect that most pleases your eye, and indeed we have
been doing this sort of thing for decades with many formatters. I grant that
automatic recognition of nested bullet lists or meticulously formatted tables
might be a stretch, but it is certainly not difficult to treat blank lines as
paragraph separators, and otherwise to ignore line breaks when reformatting
prose such as this. But once any kind of markup ("this is a table", "this is
a bullet list", "this is a section of preformatted text") is introduced, our
plain text becomes "input for a text formatter".

Incidentally, another form of plain text is "output from a text formatter",
which often has been hyphenated. Such text is an end result, not intended for
further processing.

I think that living in a world of email has demonstrated the value of plain
text, at least to most people. The lesson is that this is the only text form
that can be sent without prior prearrangement with any reasonable expectation
that it will be readable at its destination.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT