Re: Unicode plain text

From: Kenneth Whistler (
Date: Thu May 22 1997 - 21:28:46 EDT

> > How do record oriented file systems fit into this discussion ?
> > (Remember those file systems that ruled the world before the UNIX
> > idea of the byte stream came along...)
> >
> In principle, it should be just as possible to fill records with
> Unicode as it is to fill them with ASCII, Latin-1, or JIS X 0208.

And in practice. The portable Unicode backend library I have
written merrily reads and writes Unicode plain text into MVS and
VMS filing systems through standard C file interfaces. No problem.
I just don't depend on MVS or VMS to provide any specific interpretations
of *anything* in those files, nor would I want to, to stay portable.

> The VMS file system also supports the notion of "carriage control",
> of which there are many types (like the once-familiar Fortran
> Hollerith style, in which the first character specified whether the
> line was to overprint the previous line, appear on the next line,
> appear 2 lines down, etc, or start on a new page). The carriage
> control information, again, is separate from the file's data. So
> again, in principle, there should be no clash with Unicode.
> In fact, I think a VMS implementation of Unicode text might be an
> interesting exercise.

Only *interesting* in the sense you mean if you depended on VMS
for anything other than basic system services underneath a C
library. To be portable, everything else would be built on layers
of support libraries independent of VMS.

> But this too begs the question of how to
> map Unicode plain text into this environment, which in turn calls
> for a Unicode plain-text standard for such things as page breaks.

I agree with Tim that page breaks are on the slippery slope to pretty
text. Pagination is not necessary for legibility of plain text in
the same sense that line breaking (forced in some instances) or
paragraph breaking (required among other things for bidi directional
control) are. Furthermore, since pagination assumes much more
about actual rendering devices, forced pagination is as often a
source of illegibility. (Think of all those preformatted documents
you've seen at one time or another that on your device display or print
with one or two lines spilled over to the next page for each forced
page.) I suspect that the device dependency of pagination is one
of the reasons why HTML doesn't use a built-in concept of page-break
on display or FF.

> And no, I don't think this brings us anywhere near any slippery slopes.
> Page breaks have been an integral part of plain text since the 1950s
> when we were programming IBM 409 Electric Accounting Machines by
> sticking little wires into plugboards.

Again, think device dependency here. FF used to literally be the
electronic control for the "Form Feed" on a particular device. It
moved a mechanical device that shoved paper out and new paper in.

In modern Page Description Languages such as PostScript, an operator
such as showpage is a high-level operation that dumps a frame buffer
to a smart raster device. Trying to control such operations by
embedding an FF control character in plain text is pretty klutzy.


> - Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT