Collected Comments on Terminal Graphics Proposal

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Wed Oct 07 1998 - 20:07:15 EDT


Thanks to all who commented on the Terminal Graphics proposal. Here are
some collected responses to particular points.

Geoffrey Waigh <gpw@cybersurf.net> wrote:
> > A selection of terminal graphics characters is proposed for Unicode [24]
> > and ISO 10646 [19] to allow Unicode-based terminal emulation software to
> > (a) display glyphs that are found on popular types of terminals but
> > currently are not available in Unicode, and (b) interoperate with other
> > Unicode applications.
>
> I can see clear merit in handling b), but I'm leary of the code space
> consumption that a) is having here. In general, my feeling is that
> if 98% emulation does the job in an adequate fashion for
> non-perfectionists, then that is the way to go.
>
When a company is in the market for a terminal emulator, one of the factors
affecting their choice is the quality of the emulation, which includes the
ability to display all the same glyphs the terminal displays. If product A
can do this (but has to use a custom font to do so) and product B is a good
citizen and sticks with Unicode -- which prevents it from displaying the same
glyphs properly -- many companies will choose product A because its emulation
is better, even though they might suffer down the road with its nonstandard
encodings -- and maybe even total lack of Unicode support.

> [On Hex code display]
>
> That seems kind of wasteful for a debugging mode. Do the terminals
> that produce this output have escape sequences for enabling this
> mode, or is it strictly a terminal configuration option? (Of course
> by that measure the control character codes come under scrutiny...)
>
This is the largest biggest block in the proposal, and it can be dispensed
with. I do believe, however, that many developers, help-desk people, network
managers, etc, will find it handy in debugging not only terminal sessions but
Web pages, word processors, network protocols, and files using Unicode-based
tools.

Kevin Bracey <kbracey@acorn.com> wrote:
> > Unicode already has a block of Control Pictures at U+2400 through
> > U+2421, but (except for "NL" at U+2424) these go horizontally across the
> > character cell, rather than diagonally, thus making them difficult to
> > distinguish from normal alphanumeric text. A new, parallel block of C0
> > control pictures is needed in which the abbreviations are displayed
> > diagonally.
>
> That's a glyph variation - the Unicode Standard explicitly states that you
> can use whatever preferred glyph you like for these. Indeed, IIRC, ISO
> 10646-1 has considerably different suggested glyphs for these characters.
>
(And many others concurred.) OK, this block is removed from Draft 2 of the
proposal, but some suggestions added for the next edition of the Unicode
Standard.

Asmus Freytag <asmusf@ix.netcom.com> wrote [On the same topic...]:
> And thus, at minumum, the table in the book should be altered to show all
> control pictures arranged diagonally, and all future control picture
> additions should also be arranged that way.

We are looking into this for Unicode 3.0. Although the mail discussion makes
clear that the distinction between characters and glyphs is widely known, it
makes no sense to depart from the established use in the one area the
characters are intended for! Since the two glyph forms are equivalent
(i.e. there's no question of changing the identity of the characters) such a
change is editorial in nature. For what it's worth, ISO 10646 uses the
diagonal forms (although incorrectly in a roman type face).

Kevin Bracey <kbracey@acorn.com> wrote:
> > E080 SP Space (like U+2420 but arranged diagonally)
> > E081 DEL Delete (Rubout) (2-character name: DT)
>
> These two are glyph variants of U+2420 and U+2421.
>
OK, these are removed too.

> > E082 LS1 Locking Shift 1 (ISO name for SO)
> > E083 LS0 Locking Shift 0 (ISO name for SI)
>
> Maybe these two could be considered glyph variants of U+240E and u+240F?
> Probably not, I suppose.
>
I've left them in, along with IS1 through IS4.

> > E0F0 Reverse Question Mark DEC VTxxx, Wyse, Televideo (1)
>
> I would suggest U+FFFD for this.
>
This was discussed at some length, but I've left it in, since many terminals
display this glyph, and for different purposes. It does not always mean
"unknown character received".

> Even ISO-Latin1 contains the reverse question mark at 0xBF, so it is no
> need to re-invent it.
>
As noted in previous postings, the ISO one is upside down, whereas this one
is upright.

Asmus Freytag <asmusf@ix.netcom.com> wrote:
> This important character [reverse question mark] is already on the list of
> characters to be added in one the coming amendments in ISO 10646.

kenw@sybase.com (Kenneth Whistler) wrote:
> As Asmus mentioned, this one is already on its way. It is encoded in
> Amendment 18 to 10646, which is just entering its last round of ballotting:
>
> U+2426 SYMBOL FOR SUBSTITUTE FORM TWO
>
> with the requisite shape of the reversed question mark.
>
Thanks; draft 2 amended accordingly.

Rick McGowan <rmcgowan@apple.com> wrote:
> Of course, there are a *lot* of controls, many control sets, and some
> degree of overlap, as Frank's proposal points out rather dramatically. I
> would suggest that he take up an attempt at serious unification of these
> things, and collect all of the wonderful data he's gathered into a "white
> paper" on how to use control pictures for what terminals, etc. With
> mapping tables, and a list of the minimum required additions to support
> full cross-mappings.
>
I have tried to do this in Draft 2.

Paul Keinanen <keinanen@sci.fi> wrote:
> If all octet values (00 .. FF) are also going to be displayed, there might
> be some ambiguity with some of the two letter codes, e.g. FF, D1, D2, D3,
> D4, EB and EC, which should be noted in the actual font design.
>
Thanks for noticing! A caution to this effect has been added to Draft 2.

> > C1 Control characters are specified in ISO-6429 and used in the VT220
> > family of terminals [5] and the Wyse 370 [26], where they are
> > represented in the right half of the "display controls" font as shown in
> > Table 4.3 (DEC terminals use the full name, Wyse terminals use the 2X
> > name). As with C0 controls, the "name" is displayed diagonally within
> > the character cell. Unicode presently includes no C1 control pictures.
>
> Looking through various EBCDIC code pages (e.g. IBM278, IBM880) and other
> unnumbered sets it appears that these control codes are all also available
> in EBCDIC, but of course at different positions (e.g. IND at 0x24). Some
> references to these sets are "IBM NLS RM Vol2 SE09-8002-01, March 1990"
> and "IBM 3270 Char Set Ref Ch 10, GA27-2837-9, April 1987".
>
Thanks for the reference. I found a complete listing of modern EBCDIC
(which has changed considerable since the System/360 days!) in the CDRA
Registry, and have totally revised the EBCDIC controls section in Draft 2.

> >Note that three of the C1 control pictures are unassigned (the ones
> >marked by "(1)", that would be at U+E020, U+E021, and U+E039 if these
> >were assigned). These positions should be left vacant in case names are
> >assigned to these characters in a future revision of ISO 6429.
>
> In ISO 8859-1 these are listed as
>
> 80 PADDING CHARACTER (PAD)
> 81 HIGH OCTET PRESET (HOP)
> 99 SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI)
>
I have both the ISO and ECMA versions of this standard and find no reference
to these or any other control characters. Nor can I find these characters
ISO 6429 or any of the control sets in the ISO Registry. Can you give a
more precise source?

> Then I am just wondering why:
>
> ftp://dkuug.dk/i18n/charmaps/CP819 (alias Latin1 alias ISO_8859-1:1987)
> lists:
> <PA> /x80 <U0080> PADDING CHARACTER (PAD)
> <HO> /x81 <U0081> HIGH OCTET PRESET (HOP)
> <GC> /x99 <U0099> SINGLE GRAPHIC CHARACTER INTRODUCER (SGCI)
>
> and ftp://dkuug.dk/i18n/charmaps.646/ISO_8859-1:1987
> lists the same code point values for these control characters
>
> <PA> /d128
> <HO> /d129
> <GC> /d153
>
> So I just wonder, where they at dkuug.dk/i18n have taken these C0 and C1
> codes from, unfortunately these tables did not contain any references (as
> did most EBCDIC tables).

> > 5. HEX BYTES
> >
> > Hexadecimal byte values, 2 hex digits each. Like display controls, but
> > for all 256 8-bit byte values...
>
> These would be very nice :-). Note the possible ambiguity with some two
> character control pictures r.g. FF, EB etc. So special precautions should be
> taken when designing the fonts.
>
Noted in Draft 2.

Karlsson Kent - keka <keka@im.se> wrote:
> ... (though the newly suggested hexadecimal-digit-pair display ones might
> continue to be useful; though hexadecimal digit quadruples would fill an
> entier plane and more! ;-)

Rick McGowan <rmcgowan@apple.com> wrote:
> > The single biggest category is hex bytes, which so far seems to have
> > received a warm reception.
>
> What does "warm reception" mean?
>
Some nice comments (like the ones just above).

Paul Keinanen <keinanen@sci.fi> wrote:
> > No attempt was made to account for the many Viewdata, Videotex, Minitel,
> > NAPLPS, or other mosaic graphics character sets. These should be
> > tackled, if appropriate, by someone who knows something about them.
>
> And not forgetting the tele-text block characters on European TVs. With
> the introduction of TV cards for PCs that also contains a teletext
> decoder, so there is a need to display the text and block graphics on
> PC. As far as I remember, the block graphic format is more or less the
> same as Viewdata with 2 columns and 3 rows per character cell, thus
> requiring 64 glyphs.
>
There are numerous mosaic graphics, Teletex, and similar character sets in
the ISO Register. Quite honestly, I have never even seen such a terminal
and do not feel qualified to propose how/if/when/whether this class of glyphs
should be handled in Unicode.

> All in all a very interesting proposal. By using as much existing
> characters from current Unicode standard, i guess there would be a greater
> likelyhood of getting thing officially approved.
>
In most places, the proposal does not bother enumerate all of the characters
used by these terminals that are already in Unicode -- and this evidently
leaves the false impression that they were not researched. Indeed they
were! If it is necessary to get the proposal passed, of course it can be
done.

Rick McGowan <rmcgowan@apple.com> wrote:
> > Still, if I were a font maker working from the Unicode book, I'd
> > probably copy the pictures in it, so again, I'd suggest the next edition
> > show the characters diagonally within the cell, and the accompanying text
> > (which if I can overlook, so can a font maker :-)
>
> Yes, yes, but... People should read, Grasshopper. It is that for which we
> write.
>
Yes, I should know this as well as anyone, having written several books
myself, which serve to varying degrees as software manuals, and which, if
users of the software would only read them, would save me my daily 6-8 hours
of question answering -- hence the smiley :-)

Karlsson Kent - keka <keka@im.se> wrote:
> I probably should not say this, but... If you are abolutely hardbent on
> having symbols for control codes, there should be some for the Unicode
> control codes too (like paragraph separator, left-to-right-mark, etc.)
> They need not be constructed from letters...
>
I have added a section on these to Draft 2. They are not needed for
terminal emulators (at least not yet), but might be handy in other contexts.

Tony Harminc <tzha0@amdahl.com> wrote:
> > E0F6 Padlock (keyboard locked) IBM 3270
>
> This last one introduces a bit of a problem, I think. It differs
> from all other characters mentioned in that it is never displayed in
> the data portion of a 3270 screen, but rather occurs "below the line"
> as an indication of keyboard status. If it is to be included, then
> there are several more uniquely 3270 characters that can be seen
> below the line; I don't know formal names for them, and indeed they
> generally don't appear in IBM's CDRA documents. Roughly, they are:
>
> Outline up arrow (indication of upshifted condition)
> Outline down arrow (indication of downshifted (override) condition)
> Key (indication of terminal physically locked (I think
> this may be what is meant by E0F6 above)
> Stick figure (terminal is connected to "operator" (really to a
> supervisory program))
> Solid block (terminal is connected to "application program")
> 4 in square box (terminal is connected to 3274-type control unit)
> 6 in square box (terminal is connected to 3276-type control unit)
> Lightning bolt (communication failure)
> Rectangle with slash (machine check)
> Printer symbol with slash (associated printer has an error condition)
>
These have been added in Draft 2 -- but just the ones not already in
Unicode (such as outline arrows, "4 in square box" which is really just
an inverse video "4" as far as a terminal is concerned, etc).

> and most problematic:
> Left half of clock (these two form a doublewidth clock (set at 6:10
> Right half of clock or 2:30, though I'm sure the time would be
> considered a matter of glyph - indeed at least
> one non-IBM manufacturer's clock symbol was 5:50
> or 10:30)
>
I don't have an actual 3270 terminal to look at just now, but I did manage
to scrape up the IBM 3270 Component Description manual, which lists (and
illustrates) all the special glyphs shown in the Operator Information Area,
in which there is nothing to suggest that the clock is made from two
character cells. In fact, it looks quite round to me :-)

Even if it is made from pieces, I assume there is no way to see them in
isolation, and so there should be no harm in encoding the clock as a single
glyph (and then, if necessary, show it in double size).

> Now it's entirely reasonable to argue that all the above (and I may
> have forgotten a couple) have no business being encoded at all.
> Indeed some terminal emulators use graphical means to produce the
> symbols. In any case there is nothing in the 3270 architecture that
> specifies use of any of them, and an emulator program can use other
> means to communicate the same information to the user. However a
> number of Windows-based emulators I know do use glyphs encoded in a
> font that they supply to produce at least a subset of the symbols.
> (It should be pointed out that a number of "ordinary" glyphs can also
> appear below the line, but I can think of no reason not to unify them
> with the upper case letters, numbers, and so on.)
>
Right. The reason for including the special glyphs appears at the top of
this message.

> That IBM doesn't include them in CDRA may be a good reason to exclude
> them from this proposal. But they can be genuinely useful for
> writers of emulators. What to do ? And how many clocks and stick
> figures is it reasonable to encode ?
>
In Draft 2, I'm listing one of each (I retired the SNI 3:00 clock and
stick figure with hat).

(Yes, I know that on the RS/6000 there is a little animated "running man"
who can stop, fall down, etc, as an indicator of the system status, but
that's above and beyond...)

Elliotte Rusty Harold <elharo@sunsite.unc.edu> wrote:
> > E0B4 Latin capital letter H with bar SNI Math 04/05 (2)
> > E0B5 Latin small letter h with bar SNI Math 04/06 (2)
>
> Is E0B5 supposed to be Planck's constant over 2*PI? If so, it's encoded at
> 210F, 0127, and 045B. And your E0B4 is at 0126.
>
Who knows what it's supposed to be! In any case, I looked harder and found
barred H's and T's, dotted L's, etc (which look just right for the SNI
character set), as well as some Engs, in Latin Extended A (U+0100..) and so
removed them from the proposal.

As a result of all your comments, and further research, Draft 2 should be
much tighter in terms of unifications, but also more complete -- win some,
lose some :-)

It's coming up in the next message. NOTE: If it is the sense of the readers
that these proposals should no longer be posted here, but rather just
pointers to them, I'm happy to comply. In case you want to skip the next
draft in email, the pointer is:

  ftp://kermit.columbia.edu/kermit/charsets/ucsterminal.txt

Thanks again!

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT