RE: Unicode plain text

From: Murray Sargent (murrays@microsoft.com)
Date: Thu May 22 1997 - 21:27:26 EDT


I think page breaks given by <FF> (0xC) belong in the block separator
category and imply an end of paragraph. Page breaks that come in the
middle of a paragraph or word should be called _soft_ page breaks much
as we have soft line breaks. We could talk about adding an optional
page-break analogous to the optional hyphen (0xAD), but computer
folklore of the years clearly indicates that <FF> shouldn't be
overloaded for this purpose. (Off hand, I don't think an optional
pagebreak would be a useful code to have, since you'd really like to
have the semantic "eject if within n lines of the page bottom." Such a
semantic requires the number n, which doesn't fit into a single code
position.)

Murray

> -----Original Message-----
> From: Unicode Discussion [SMTP:unicode@unicode.org]
> Sent: Thursday, May 22, 1997 4:00 PM
> To: Multiple Recipients of
> Subject: Re: Unicode plain text
>
> > How do record oriented file systems fit into this discussion ?
> > (Remember those file systems that ruled the world before the UNIX
> > idea of the byte stream came along...)
> >
> They are far from dead; IBM VM/CMS and Digital (Open)VMS, to name
> two, are still widespread. But VM/CMS and other IBM mainframe
> and midrange operating systems use EBCDIC text encoding and I am
> not aware of any movement to support Unicode in this setting,
> at least not internally.
>
> In VMS, most text files are record oriented -- usually variable
> length records, with end of line *implied* for each record, but
> not recorded in any particular format. This is actually quite a
> sensible approach, given the wide variety of text-stream formats
> that abound for no good reason.
>
> In principle, it should be just as possible to fill records with
> Unicode as it is to fill them with ASCII, Latin-1, or JIS X 0208.
>
> The VMS file system also supports the notion of "carriage control",
> of which there are many types (like the once-familiar Fortran
> Hollerith style, in which the first character specified whether the
> line was to overprint the previous line, appear on the next line,
> appear 2 lines down, etc, or start on a new page). The carriage
> control information, again, is separate from the file's data. So
> again, in principle, there should be no clash with Unicode.
>
> In fact, I think a VMS implementation of Unicode text might be an
> interesting exercise. But this too begs the question of how to
> map Unicode plain text into this environment, which in turn calls
> for a Unicode plain-text standard for such things as page breaks.
>
> And no, I don't think this brings us anywhere near any slippery
> slopes.
> Page breaks have been an integral part of plain text since the 1950s
> when we were programming IBM 409 Electric Accounting Machines by
> sticking little wires into plugboards.
>
> - Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT