Re: Plain Text

From: Markus Kuhn (Markus.Kuhn@cl.cam.ac.uk)
Date: Wed Jun 30 1999 - 17:22:48 EDT


Juliusz Chroboczek wrote on 1999-06-30 19:36 UTC:
> You need a paragraph separator and possibly a line break (and perhaps
> a page break). Unicode defines well-standardised codepoints for
> those. If you use other control characters, such as SO/SI for
> controlling boldface or italics, or BS (or CR) for overstriking, or
> terminal control sequences, it ain't plain text no more.

The only thing that is clear about "plain text" is that it is not well
defined at all. There is certainly no ISO standard that gives you any
indication of what "plain text" is. The Unix community feels somewhat
confident about the notion of plain text, just because they have editors
such as ed, vi, emacs, etc. that agree on a common text format that is
so simple that it has become customary to refer to it as plaintext.

Many aspects of "plain text" are ill-defined these days:

  a) how do you terminate lines and paragraphs
  b) is there a terminator after the last line/paragraph
  c) is the line formatting the task of the sending or the receiving
     process?

For Unix the answers used to be

  a) LF and no paragraph concept
  b) yes
  c) the sender has to insert line breaks

but thanks to the heterogenity of the Internet, these strict rules have
for some years been weakened significantly in common practice. Some
aspects of the classical Unix plaintext definition (which came
originally from tty output hardware interfaces) do not make sense any
more. For example, the insertation of LFs in the middle of paragraphs,
causes these LFs to move around whenever a few words are changed, which
seriously disrupts revision control systems (e.g., diff and RCS) and it
is not adequate anymore at all today with reformatting web browsers now
being a dominating output device and not 1960s ttys.

I think the Unix community should slowly get used to the idea of
abandoning LFs in the middle of paragraphs in plain text documents and
let the editor and display tool perform the reformatting at display
time.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT