Re: Plain Text

From: Geoffrey Waigh (anzu@home.com)
Date: Fri Jul 02 1999 - 13:06:26 EDT


Frank da Cruz wrote:
>
> Then let's try again. Let me get the ball rolling with the following simple
> suggestion for Unicode Plain-Text File and Interchange Format:
>
> A monospaced character-cell display device is assumed for the purposes of
> line breaking. Characters that are too wide for a character cell (such as
> Kanjis) occupy a double-width cell. Of course, Unicode Plain Text can also
> be displayed on any other kind of device, in any font, monospaced or not, in
> which case "all bets are off", just as they are now with traditional plain
> text when displayed in a proportional font.

Why are you specifying font characteristics for plain text?

> Conversely, it is recognized that a monospaced (or duospaced) character-cell
> device might be inadequate for display of certain writing systems, such as
> Arabic or Indic scripts, and in this case intelligent rendering engines
> might very well be required. This should, nevertheless, be possible with
> plain text, without the aid of any particular markup scheme.

And then saying that you don't really need a monospace font and it is
still plain text even when you have to do a proper job of rendering it?

>
> Plain text is composed only of Unicode characters, with no meta-level
> of formatting information, presentation hints, etc, except:
>
> 1. Spaces, such as U+0020 and U+00A0, which are are "kept" (e.g.
> adjacent spaces are not collapsed).

I don't see how barring all the other spacing and presentation codes
(e.g. ZWNJ) improves plain text.

>
> 2. Horizontal Tabs are indicated by the HT character, U+0009. Tab
> stops shall be assumed every 8 columns, starting at the first. (This
> provision is primarily to facilitate conversion of ASCII and 8-bit
> text to Unicode. Alternatively, it would be OK to force all
> horizontal alignment to be accomplished by spaces.)
>
> 3. Line breaks are indicated by Line Separator, U+2028. Preformatted
> text must break lines at column 79 or less to avoid unwanted
> reformatting. Column numbers are 1-based, relative to the left or
> right margin, according to the previaling directionality, with
> single-width characters as the counting unit. A line break is
> required at the end of the final line if it is to be considered a
> line. (This is to allow append operations to work in the expected
> fashion.)

I don't see how specifying the maximum text width is in the purview of
"plain text." That is suggesting that running my terminal in 132 column
mode (or printing on wide paper/with narrow fonts,) involves something
special. I suspect that all the attention to cell widths, column
counting and what not is to make tab processing map nicely to the
character cell terminal model. That model is responsible for some
horrible hacks when it migrated to other countries and I believe the
difficulties in adapting software that depends on it to writing systems
it does not work for has been a serious drag on more advanced Unicode
implementations.

>
> 4. Paragraph breaks are indicated by two successive Line Separators
> or by Paragraph Separator, U+2029.

If we are supporting Unicode and have a notion of Paragraph it seems
reasonable to specify it is denoted with U+2029.

>
> 5. Hard page breaks are indicated by FF, U+000C.

Geoffrey



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT