Re: Plain Text

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Fri Jul 02 1999 - 03:11:33 EDT


At 15:32 -0700 6/30/1999, Frank da Cruz wrote:
>> The only thing that is clear about "plain text" is that it is not well
>> defined at all.

My experience is that ASCII plain text is sufficiently well defined but has
been incredibly badly implemented, due in part to the requirement in the
1960s and 1970s for keeping programs as small as possible, and in part to
the rarity of cross-platform file transfer until the 1990s.

The original definition, as John Cowan has pointed out, was anything a
Teletype could reliably render, including overstrikes. Thinking of ASCII as
printer commands rather than text makes it easier to understand the origins
of its problems. (I have used printing terminals and video terminals that
permitted overstrikes, designed for APL in particular and for what you will
in general. Overstriking used to be taught in typing textbooks for creating
signs like cent, c BS /.

The problems we have with ASCII plain text come mainly from a small set of
common variant practices.

Using CR, LF, or CR/LF as a line or paragraph end
Different tab spacings
Optional line wrap
Formfeed codes vs. computed page breaks
BS = DEL or BS-overstrike

In the past, editors on one platform, or written for one purpose, ignored
all other practices. I use two text editors, Alpha for Macintosh and
Notespad (note extra 's') for Windows, which can handle all of these
variations according to my preferences, including the ability to read and
write text files with Mac, Windows, or Unix line break codes. Notespad even
maintains an extensible list of file types where line breaking is never to
be changed by the editor (mostly programming language source code). Alpha
asks whether to wrap paragraphs when opening files.

>Actually, it tends to be well-defined for each platform. And then the
>interchange methods among platforms tend to converge on a few simple
>conventions: ASCII (or the appropriate ISO character set, or now UTF-8 or
>other form of Unicode), as opposed to EBCDIC (or Baudot, or Sixbit); CRLFs
>separating lines, and paragraphs separated by blank lines. Somewhat less
>well defined, but nevertheless in common use, are bare Carriage Return or
>Backspace for overstriking, Formfeed for "new page", and Tab for tabbing
>(with several different conventions about tabstops).

That is, we agree on everything except our variant usages.

>Lines are terminated at somewhere between 72 and 80 characters by
>convention, because that's how wide terminal screens are, and before them
>the Teletype carriage, and before that the most common kind of punchcard.
>Or for that matter, typewriters and sheets of paper (A4 or US, take your
>pick :-)
>
>To this day, we follow these conventions in newsgroups and email, although
>now it might be more a matter of "netiquette" than necessity (as in the
>BITNET days, when e-mail was, quite literally, 80-column card images).

     As long as e-mail readers cannot correctly reformat messages with bad
line breaks
     (like this), it will be a matter of real necessity.

>These simple conventions let us format our text exactly the way we want to.
>We can indent or not, we can put line breaks where we want them, we can have
>columns of numbers or other tabular presentations, mathematical expressions,

which actually require several hundred non-ASCII characters, unless you
mean, as so many do, arithmetic expressions.

>and idiosyncratic forms of emphasis. Many people want their text to stay
>the way they wrote it. And many people also are not fond of receiving email
>in every kind of bizarre format than any application developer can dream up
>when it contains, in fact, nothing but words (but I stray).

When I want my text to stay as I wrote it, I put it into a PDF, not a text
file. Others prefer TeX for this purpose, or PostScript.

>> I think the Unix community should slowly get used to the idea of
>> abandoning LFs in the middle of paragraphs in plain text documents and
>> let the editor and display tool perform the reformatting at display
>> time.
>>
>But what IS plain text? Maybe some people might like to have their email
>reformatted, but I don't think they want their C or Fortran or PostScript
>programs to receive the same treatment. Nor, for that matter poetry or any
>other forms of text where line breaks, indentation, and blank lines serve a
>purpose. As in, for example, the preceding paragraph.

Yes, it's that old Devil cross-cultural ignorance again. It wouldn't
surprise me if some people here had never even read a Fortran program.

>No more plain-text bashing! No more "legacy" saying! Our focus should be
>not on stamping out plain text, but on promoting international multilingual
>communication through a universal character set that does not impose a
>a particular modus vivendi upon its users.
>
>- Frank

We raised the question of defining a Unicode plain text format about two
years ago, but nothing seemed to come of it. We also discussed the
possibility of actually *using* Unicode text in this discussion, but
nothing came of that either. Does anyone else here feel excessively
constrained by our lack of glyphs for the characters we talk about? Would
anyone else like to get UTF-8-capable mailers and extensive sets of Unicode
fonts and see what effect they have on our deliberations?

I have made the suggestion before, but here goes again--Alis Technologies
offers a 30-day free trial period of its Tango Browser with Tango E-mail,
downloadable from http://www.alis.com/internet_products/try_form.html. It
runs on Windows 95, 98, and NT. Would anyone care to try it with me?

--
Edward Cherlin                        President
Coalition Against Unsolicited Commercial E-mail
Help outlaw Spam.       <http://www.cauce.org/>
Talk to us at             <news:comp.org.cauce>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT