Re: Unicode selections for X11 (cont'd)

From: Juliusz Chroboczek (jec@dcs.ed.ac.uk)
Date: Wed Jun 30 1999 - 15:39:28 EDT


>> I've got a question about the C0 and C1 control character ranges.
>> I call them `legacy control characters'. Do people object to this
>> terminology?

Frank da Cruz <fdc@watsun.cc.columbia.edu>:

FdC> I hope so! The word "legacy" is emotionally toned and
FdC> value-laden. It denigrates 30+ years of computing practice and
FdC> standards activities, and it implies that plain text is a relic
FdC> of the past to be discarded with all possible haste,

It cannot be said that the C0 and C1 control characters are the
greatest achievement of these ``30+ years etc.''

FdC> In fact, plain text is the only immutable format in computing.

Agreed. And the only reason it is not portable is the poor
standardisation of the C0 and C1 control characters.

I've seen the following forms of plain text:

  NL is a line break, there's no paragraphs: Unix
  NL is a line break, NL NL is a paragraph separator: Unix
  NL is a paragraph separator, line breaks are implicit: ports of
    MS-DOS applications to Unix.
  CR LF is a line break: MS-DOS
  CR LF is a paragraph separator, line breaks are implicit: MS-DOS.
  CR LF is a paragraph separator, CR (or was it LF?) is a line break:
    MS-DOS.
  CR is a line break: MacOS.
  CR is a paragraph separator: MacOS.

without counting, of course, systems on which record information is
kept out-of-band (such as VMS).

>> Does anyone have a better name?

FdC> C0 and C1 control characters. These are ISO standard character
FdC> sets and ISO-standard terminology is available to refer to them.

Okay. Changed.

FdC> Finally, please remember that Unicode is a plain-text standard.
FdC> The control characters are there for a reason: you need them in
FdC> plain text.

You need a paragraph separator and possibly a line break (and perhaps
a page break). Unicode defines well-standardised codepoints for
those. If you use other control characters, such as SO/SI for
controlling boldface or italics, or BS (or CR) for overstriking, or
terminal control sequences, it ain't plain text no more.

                                        J.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:47 EDT