Re: Feedback from C1 Control Pictures Proposal

From: Frank da Cruz <fdc_at_columbia.edu>
Date: Mon, 22 Aug 2011 10:07:48 EDT

> I would like to ask Frank for a bit of help here (and, to the extent that
> Ken thinks that the proposal is reasonable, some affirmation that the
> uses/demonstration of demand will be seen as acceptable to the Unicode
> people). Specifically, can Frank help identify, and possibly provide
> screenshots, of:
>
> - C0 control pictures in use
> - C1 control pictures in use
>
Maybe only an older person would understand this point, but to emulate a
particular terminal, you have to make the emulator show on the screen what
the real terminal would show. Since I have been laid off and have to clean
out my office this week, I don't have time to re-do the research, but "many"
terminals -- my vast collection of terminal manuals has been boxed for
shipment to the Computer History Museum:

  http://www.columbia.edu/cu/computinghistory/books/#terminalmanuals

...have glyphs for C0 controls and some have them for C1 controls. Here's
the exhibit I prepared for my proposal in 1998:

  ftp://kermit.columbia.edu/kermit/ucsterminal/terminal-exhibits.pdf

Here again, for reference, is the proposal itself (only the C1 part is
relevant to this discussion):

  ftp://kermit.columbia.edu/kermit/ucsterminal/control.txt

The exhibit shows:

Terminals that have C0 control glyphs:
  DEC VT320, 420, etc
  Data General Dasher
  HP-2621
  Wyse 60
  Wyse 370
  Atlantic Research Corporation Interview 30A Data Analyzer (exhibit N1)

Terminals that have C1 control glyphs:
  DEC VT320, 420, etc (full set)
  Data General Dasher (partial set)
  Siemens-Nixdorf 97801 (as hex byte pictures 80, 81, etc)
  Wyse 370 (full set)

This is not an exhaustive survey, more of a proof by existence.

> * Unfortunately, I don't actually know of any applications, other than
> Penango (my company's primary product), which currently use the U+2400
> range. [That is what kicked off this proposal, by the way.]
>
I don't have information about what applications use them. Our own terminal
emulator, Kermit 95:

  http://www.columbia.edu/kermit/k95.html

does not. That's because it was designed to be portable between Windows
console screens and GUI screens, and no Windows console font contained
control pictures. Instead, when we put the emulator into debugging mode,
color is used. Obviously, that's not plain text, but this way it shows
control characters in a single cell.

By now, Kermit 95 would indeed use control pictures in its GUI version,
except that the programmers aren't here any more, and except that C1 control
pictures are not defined yet. By the way, the cancellation of the Kermit
Project is is not an end but a new beginning, because now the source code
for Kermit 95 has been published with an Open Source license:

  http://www.columbia.edu/kermit/k95sourcecode.html

So here is why I believe it is important to have C1 control character glyphs
available in Unicode:

 . Terminal emulation is still important. For example, everybody who uses
   the Unix shell is doing so through a terminal emulator. And here, as we
   all know, is where the real work gets done -- coding, website creation
   and maintenance, system administration, network configuration, etc etc.

 . Since the Unix shell and other text-only online environments exist
   outside the English-speaking world too, terminal emulators are being
   updated to support UTF-8. Kermit 95 has supported it since about 2002.
   The Linux console window (which is a terminal emulator) uses UTF-8 *by
   default*.

 . The terminals that are emulated were manufactured before 1995, and
   therefore mostly follow the ANSI X3.64 definition, which reserves both
   C0 and C1 for control characters, as Unicode itself has done.

 . But Microsoft has created code pages that are identical to ISO standard
   character sets such as ISO-8859-x (which are compatible with ANSI X3.64),
   but with graphic characters in the C1 area. These have leaked into
   every part of the Internet, including text that we view in a terminal
   screen (e.g. email).

 . When a real terminal, or a program that emulates one, receives text
   written in, say, Microsoft code page 1252, it invariably hangs. Why?
   Because the text contains "smart quotes" or somesuch, which coincide with
   valid C1 commands understood by the terminal. Some of which, such as
   ISO 6429 DCS, OSC, or APC, are a header for a "packet" of control
   information. The terminal waits for the end-of-packet for the control
   sequence, as it must do, but it never comes.

Those who support terminal emulators need tools to diagnose problems like
this. The best and most portable tool is to put the terminal into "display
controls" mode. This is a feature that the above mentioned terminals had.
A Unicode-based terminal emulator has glyphs to show the C0 controls but not
the C1 controls, which can b e even more lethal than the C0 ones when used
improperly, as they are in Windows code pages.

Note that tech support is done not only on the scene, but remotely. Support
technicians need to be able to ask users over the phone, "what do you see"?
Right now there are not any glyphs in Unicode that could be used for this
purpose. This is why I proposed C1 control pictures, and also why I
proposed Hex byte pictures. People can read them and say what they see,
they are just Roman letters and digits.

Or, they can copy the screen and paste it into an email to make a problem
report. This is why the code points should be standardized: the recipient
of the email should be able to see the same glyph that the sender saw. And
by the same token, debugging techniques can be documented in plain text,
with examples.

Frank da Cruz
http://www.columbia.edu/~fdc/
Received on Mon Aug 22 2011 - 11:38:03 CDT

This archive was generated by hypermail 2.2.0 : Mon Aug 22 2011 - 11:38:04 CDT