Displaying APL overstrikes (was Re: Normalization Form KC for Linux)

From: Edward Cherlin (edward.cherlin.sy.67@aya.yale.edu)
Date: Sat Aug 28 1999 - 05:03:22 EDT

At 14:28 -0700 8/27/1999, Frank da Cruz wrote:
>Very few terminals are designed to allow composition of characters and
>the few that are do so for good reason (e.g. the ALA bibliographic
>character set, APL). I can't say whether composition is accomplished
>by having the combining characters come before or after the base
>character in these cases, but I suspect it's before.

I pulled this paragraph to the front so I can explain a few things before I
comment on other issues.

Short answer: There are no "base" and "combining" characters in APL (for
example quad-quote and quote-quad

| | |
| |

are indistinguishable). It is never necessary to undo a line wrap or
scroll, as long as the correct line width for the terminal has been set in
the APL software.

My father was an APL consultant. Working with him, and later as Managing
Editor of APL News for Springer-Verlag, I have used a variety of APL
terminals (Diablo, Qume, and Spintronic daisywheels; IBM golfball; TI
Silent 745 thermal portable; Andersen-Jacobson and Tektronix video
terminals; APL terminal emulation on an Apple II; and the terminal
emulation functions of various APL programming packages on Z80 CP/M and
8086 CP/M-86 computers of several kinds, PCs, Macs, and UNIX systems. I
have seen but not used similar software for 68000 systems (Fortune, Atari,
Commodore) and IBM PowerPC systems. A group of enthusiasts put me in charge
of a project that put ANSI/ISO standard APL (no omissions, no deviations, a
few extensions) on 6502 computers (BBC Micro, Commodore, Apple), Z80
computers (Osborne, Nokia, and others), 8086 computers (PC, NEC),
Macintosh, and UNIX, with binary program and data file portability between
all platforms. We published manuals and tutorial material in English,
French, German, Finnish, Russian, and Japanese.

On APL printing terminals there were no dead keys. Overstrikes were typed
literally as character-backspace-character. This was simulated on the first
APL video terminals, so that no changes in keystrokes from the user or data
from the host would be required. The interpreter was required to accept
overstrikes in either order. Actually the user was allowed to come back
from later in the line, backspacing multiple times, typing the overstrike,
and spacing over to the end of the line again.

The overstrike model for creating new APL characters arose from the ability
of IBM printing terminals, based on IBM Selectric golfball typewiters, to
backspace and type another character over the first one, either under
operator control or program control. A good deal of thought went into the
design of the APL golfball characters, so that overstrikes would line up
properly. Daisywheel printers could also backspace, of course.

>The model of base character
>followed by nonspacing diacritics (as opposed to the other way around)
>does not mesh well with terminal/host communication, where incoming
>characters must be displayed in realtime. A letter A arrives and the
>terminal displays it in the current position and moves the cursor to the
>next position, which might be on a new line due to screen wrap, and
>this, in turn might cause scrolling, in some cases even off the screen
>due to narrow vertical margins. Then a nonspacing acute accent arrives.
>At this point, the terminal has to find where it left the A and change
>it to something else, but this time avoid the wrap since it was done
>previously, and then put the cursor back where it was before. (The
>situation might be even more confusing when the host controls wrapping.)

We never let the printer get to its farthest right position in APL. Users
could set the printwidth system variable QuadPW to a suitable length, and
could change it for 80 column terminals and 132 character printing
terminals. Thus there were no overflows (new line, scrolling) to undo. The
interpreter would fold lines (ugly, but no lost data) as often as necessary.

>This is an awful lot of work and screen changing for little benefit when
>precomposed characters are already available. In any case, changing a
>character after having already drawn it is not the best "human
>engineering" -- terminal users are not accustomed to having to reread
>text already read in case it changed, and this will be especially
>noticeable when a congested network delays the arrival of a diacritic.

From the beginning (around 1960) overstrike combinations were treated as
characters with their own code positions inside the interpreter. APL video
terminals naturally went to precomposed glyphs for output. The user had the
option on input of typing special key combinations for each overstrike, or
continuing to compose them.

>At the very least the redrawing of terminal-screen characters is likely
>to introduce unwanted and perhaps harmful flicker (I'm sure you've all
>read about the dangers of the "critical flicker frequency", e.g. to
>those who drive along those picturesque tree-lined French country roads
>at sunset :-).

You're reaching. :-) On today's systems, you wouldn't see the flicker.
Rendering both characters would almost always be complete before the screen
could retrace. You should worry more about people who scroll through files
a line at a time instead of paging. :-) :-)

>There are no such problems when reading Unicode data from a file, since
>we can always look ahead and collect all the diacritics before deciding
>which character to show, with no delays or deadlocks.

The case is similar for APL output, as long as the line width value the
interpreter is using is correct for the actual output device.

>This is not to grumble about the final Unicode / ISO 10646 design, but
>suggest that there can be valid reasons for preferring precomposed
>versions of characters to decompositions when there is a choice and
>perhaps even requiring it in certain applications such as terminal

Precomposed glyphs, unquestionably. Precomposed characters, in some but not
all circumstances.

>There are countless (not as in "infinite" but as in "who can count?")
>versions of UNIX and lots of other non-UNIX platforms that use
>traditional character sets where we'd like to see Unicode make some
>headway. But there is little chance we can expect the keepers of
>all these diverse platforms to rip them apart from the bottom up to
>replace the character handling model at every level to accommodate
>composition of characters -- even if that's the right thing to do --
>without breaking accessibility to their platforms by users of
>traditional methods.

I don't see how implementing Unicode correctly would break any systems that
weren't broken already. Of course, implementing Unicode incorrectly would
break a lot of things. Making Unicode available in specific applications,
or in an API, doesn't mean you have to throw out older ones.

>It might be better to let Unicode get its
>foot in the door without upsetting everything and then grow of its
>own accord -- i.e. according to user demand.
> ^^^^^^^^^^^
>- Frank

Which is what will happen in any case.

Edward Cherlin   edward.cherlin.sy.67@aya.yale.edu
"It isn't what you don't know that hurts you, it's
what you know that ain't so."--Mark Twain, or else
some other prominent 19th century humorist and wit

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT