Re: Plain Text

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Mon Jul 05 1999 - 13:21:57 EDT


[Peter wrote]
> I find myself in agreement with Ken W's comments a few messages back. I'm
> also inclined to say that you are wanting to define (in effect) a MIME
> type, and that part of the confusion / disagreement that has arisen in
> this thread comes about by calling this type "plain text".
>
I most emphatically do not want to define a MIME type, because MIME will
disappear some day but Unicode will last forever (if we do it right).

> You want a file that is tagged with null markup to be interpreted in a
> specific way (as a text document as opposed, e.g. to a database) and with
> specific layout formatting. As was pointed out in an earlier message, and
> as we are all familiar with, sometime files that contain only text
> characters and no tagging are used for purposes other than this, such as
> the CSV database. Also, there are times when I've had such text files in
> which I intend all of the text that exists between instances of { BOF,
> EOF, NLF } to appear on a single line, regardless of length (e.g. in
> source code), and other times when I expect it to wrap to whatever width
> is appropriate for the window in which it is viewed.
>
All of that is fine. I'm only proposing that we codify existing practice.
If Unicode has a Line Separator (and it does), then if I put it in a file,
it should serve its purpose. Ditto for Paragraph Separator. Ditto for
C0 HT and FF (even though those purposes might be ill-defined), in the
absence of "native" Unicode replacements for them.

I agree that marking a "plain-text" stream as "preformatted" or "to be
flowed" is a higher-level issue. However, we must also agree that plain text
CAN be preformatted and not ALWAYS flowed, and that Unicode already contains
the mechanisms to do it.

> All of these are legitimate things to want to be able to do with a file in
> this format that we have always known as "plain text". Neither the
> intended meaning of the content, nor the intended appearance have ever
> been part of the definition of plain text. Thus, I think you should expect
> some objection to any suggestion that "plain text" should refer to a file
> that is intended to be interpreted in a specific way, i.e. as a text
> document with specific layout formatting. Plain text can be neither more
> nor less than what is has always been. As we apply plain text to the
> Unicode context, Ken's comments were on the mark.
>
> That is not to say that it isn't reasonable, or desireable, to specify a
> file format to be used for text documents with specific layout formatting
> such that it will always appear as the author intended, and such that no
> markup is used beyond a standard interpretation of the characters
> (separating this file format from others such as PDF). We'd all benefit
> from it, if an agreement can be made. I just think that we may need to
> call it something else.
>
"Preformatted plain text"? It's not catchy but I think it says what it means.

> I certainly empathise with a desire to have a standard for preformatted
> plain text. Here's the first paragraph of something in a message sent to
> me recently.
>
Yes, "fractured plain text" comes from a flawed conversion algorithm, e.g.
when pasting from a web page into an email window (a "double-ended break"
in this case: misinterpretation of the left margin as leading spaces by the
copier and gratuitous word wrapping by the paster). Obviously that's an
application issue. However, I do believe that if we can establish a
baseline for preformatted plain text, makers of such applications will have
a better idea of how to interchange text.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT