Re: Unicode plain text

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Thu May 22 1997 - 19:00:11 EDT


> How do record oriented file systems fit into this discussion ?
> (Remember those file systems that ruled the world before the UNIX
> idea of the byte stream came along...)
>
They are far from dead; IBM VM/CMS and Digital (Open)VMS, to name
two, are still widespread. But VM/CMS and other IBM mainframe
and midrange operating systems use EBCDIC text encoding and I am
not aware of any movement to support Unicode in this setting,
at least not internally.

In VMS, most text files are record oriented -- usually variable
length records, with end of line *implied* for each record, but
not recorded in any particular format. This is actually quite a
sensible approach, given the wide variety of text-stream formats
that abound for no good reason.

In principle, it should be just as possible to fill records with
Unicode as it is to fill them with ASCII, Latin-1, or JIS X 0208.

The VMS file system also supports the notion of "carriage control",
of which there are many types (like the once-familiar Fortran
Hollerith style, in which the first character specified whether the
line was to overprint the previous line, appear on the next line,
appear 2 lines down, etc, or start on a new page). The carriage
control information, again, is separate from the file's data. So
again, in principle, there should be no clash with Unicode.

In fact, I think a VMS implementation of Unicode text might be an
interesting exercise. But this too begs the question of how to
map Unicode plain text into this environment, which in turn calls
for a Unicode plain-text standard for such things as page breaks.

And no, I don't think this brings us anywhere near any slippery slopes.
Page breaks have been an integral part of plain text since the 1950s
when we were programming IBM 409 Electric Accounting Machines by
sticking little wires into plugboards.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:34 EDT