RE: A basic question on encoding Latin characters

From: Robert Brady (robert@ents.susu.soton.ac.uk)
Date: Tue Sep 28 1999 - 16:05:55 EDT


On Tue, 28 Sep 1999, Kenneth Whistler wrote:

> Once again, this would be a *protocol* error. If the communication protocol
> is waiting for "xxxxá", then it should act when it receives the final
> "á" as a unit, or if it has received an "a", then it should act when it
> receives the final combining acute accent. And ordinarily the communication
> protocol should specify a normalized form, so it doesn't have to deal
> with alternative forms as equivalent for these purposes.

You miss the point entirely when talking about "login:", so spectularly it
is almost funny. I find it hard to believe that you have never used a
TTY-based system, and never used a telnet client!

The terminal <-> host architecture works like this :

A program executing on a host will write to the terminal via some method.
(in UNIX, this is typically done by writing to the STDOUT file
descriptor). The characters will get sent, maybe by pipes, maybe by
sockets, maybe via a serial or parallel cable, to a terminal or a terminal
emulator.

The terminal emulator will deal with certain special sequences, C0
control characters for example. One C0 character is ESC, which starts a
sequence of characters used send control information, (e.g. position
cursor, set color), to the terminal. This sequence is bounded : i.e. you
know when it started, and you know when it has ended, and you know that
you are in the middle of it. The intent is for normal ASCII/whatever you
are using characters to be processed immediately.

When I "telnet" to a machine, it often invokes a program that prints
"login:" and waits for input. This is merely the first example of a
severe problem with implementing combining characters with the Unicode
model on a terminal : which is the combining characters in Unicode are
AFTER the base character. [and note, I do not propose that be changed, it
makes sense for other reasons, I just want the problem acknowledged and
understood.]

The terminal emulator will obtain "login:" from the program, and will
display it immediately. So far, so good.

If the terminal read Hello<double-exclamation-mark>, encoded in UTF-8,
this would also work, because of the properties of UTF-8.

But, if a terminal read merely Hello, and renders this, due to a
delay, the final <combining-paperhat-above>, what is it to do?

The only option is to go back and rerender the "o", only with a paper hat
above. This causes unpleasant flicker, which wouldn't happen if combining
characters were placed before the base character, as it would be possible
to tell with zero-lookahead when a combining sequence ends.

And lose the contemptuous attitude to terminals please. I do almost all my
work in xterms. They are not obselete, regardless of your wishes.

-- 
Robert



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT