Re: Plain Text

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Fri Jul 02 1999 - 14:00:05 EDT


> Why are you specifying font characteristics for plain text?
>
Only for purposes of getting across the idea that "long line = paragraph,
break where you please" should not be considered well-formed plain text.
Or, to look at it the other way, that plain text must allow for hard line
breaks, and there should be a convention as to how long we might reasonably
expect lines to be. "Columns" are the only measurement that makes sense
(surely not picas, inches, millimeters, pixels, ...) and this presupposes
fixed spacing.

This might be a farfetched notion except that it is completely consonent
with current practice.

The fact that monospaced fonts have fallen out of fashion should not cloud
our judgement. Naturally they present some difficulties for multilingual
text, but they also provide numerous benefits. They let me compose a text
document that anybody can read in -- barring "rendering engine" interference
-- the same form in which I composed it. Tables line up, columns of numbers
add up, comments in my C program are aligned, etc. All this without our
having to agree in advance on which rendering engine or markup language to
use.

Parenthetically, look at the mess the craze for the typeset appearance has
gotten us into. If I want to make a table on a Web page or in a typeset
document, I have to use some kind of markup language or "table" package,
rather than just spacing or tabbing the items appropriately. Which is fine
until you consider that any markup language or tables package you are using
today will be long forgotten a few years from now, and so your laboriously
constructed document will either require conversion or be lost forever (or
humans will need to read the markup language directly).

As noted, I grant that the monospace-font model does not apply equally well
to all writing systems, but for the many to which it does apply -- Roman,
Hebrew, Cyrillic, Armenian, Greek, Georgian, etc, and to some extent CJK
since, at least in Japan, they have been using mono- and duospaced fonts on
terminals and PCs for decades, and care as much about things lining up as
anybody else -- should guidelines not be stated up front?

> > 1. Spaces, such as U+0020 and U+00A0, which are are "kept" (e.g.
> > adjacent spaces are not collapsed).
>
> I don't see how barring all the other spacing and presentation codes
> (e.g. ZWNJ) improves plain text.
>
They aren't barred -- they are Unicode characters that are not C0 or C1
control characters. And they aren't a higher-level markup language.

> I don't see how specifying the maximum text width is in the purview of
> "plain text." That is suggesting that running my terminal in 132 column
> mode (or printing on wide paper/with narrow fonts,) involves something
> special. I suspect that all the attention to cell widths, column
> counting and what not is to make tab processing map nicely to the
> character cell terminal model. That model is responsible for some
> horrible hacks when it migrated to other countries and I believe the
> difficulties in adapting software that depends on it to writing systems
> it does not work for has been a serious drag on more advanced Unicode
> implementations.
>
I suppose you're right about the intention. That's what the discussion is
for -- to find suitable language for expressing a model for "text that is
already formatted and stands on its own without additional formatting from
any higher intelligence and that can displayed by the most minimalistic
plain-text viewer", like this email message.

You might be right about specifying a maximum line length. And yet,
if there is to be such a thing as preformatted plain text, and none of us
can deny that there already is such a thing since this is how we commicate,
should there not be some form of guideline as to what is a safe default
line-length, in the absence of any prior agreement to set a different one?
That's what we do now, implicitly. Why not make it explicit? So how should
the guideline be expressed?

Let's assume you are composing some plain text, and you don't care how it's
rendered. Then don't include Line Separators and let the viewer "flow" the
text. That's fine for ordinary prose, but it assumes a viewer that knows
how to flow text, and I'm not sure that a text-flowing viewer should be
assumed or required. As somebody mentioned earlier, most printers will
truncate long lines, as will many terminals and other display devices.

If you do care how the text is rendered, include Line Separators.

> > 4. Paragraph breaks are indicated by two successive Line Separators
> > or by Paragraph Separator, U+2029.
>
> If we are supporting Unicode and have a notion of Paragraph it seems
> reasonable to specify it is denoted with U+2029.
>
Agreed and amended already.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT