Re: A basic question on encoding Latin characters

From: Frank da Cruz (fdc@watsun.cc.columbia.edu)
Date: Tue Sep 28 1999 - 10:10:38 EDT


> Um, at that time the normalization hadn't been done. So at that time there
> weren't _technical_ reasons for drawing a line at the normalization
> border. The line was drawn after that time. It could have been
> before. But it has been drawn and there had better be really good reasons
> offered if we are not to respect it.
>
In interactive telecommunications, we have the following situation:

 1. Host sends "login:" (or any other prompt).
 2. User is supposed to type her ID (or any other response).

When using Unicode, the terminal emulator may not print the final character
of the prompt because it doesn't know yet whether any combining characters
will follow. So the user doesn't know whether the host is ready to receive
a response and therefore should not reply since in some cases (e.g. at the
UNIX "Password:" prompt) an early response is discarded.

If the process is being executed by a script, the script sits and waits;
"waitfor 'login:'" will not succeed, since it can not be known whether
'login:' has arrived until the next base character after ':' comes, but no
such character is coming (I realize it is silly to expect a colon to have an
accent but those are the rules -- and not all prompts end with colon).

There is no escape from this situation other than introduction of a "higher
level protocol" to signal "ok, I'm finished transmitting, now it's your
turn", just like in the old half-duplex days.

This is the kind of reason that telecommunications-oriented applications
seem to be steering away from the Normalization Form D model, however
appropriate it might be in other areas, and embracing Normalization Form C
(ISO 10646 Level 1) and, by extension, precomposed characters, as we have
seen in Plan 9 and now, it seems, Linux. I don't think this indicates
recalcitrance or West European bias in UNIX culture as much as a desire to
preserve telecommunications and the terminal/host model as a viable
interface between human and machine in the Unicode age, as it has been since
beginning of the computer age. I also think it's no accident that Unicode
is best supported on those platforms that have eschewed the terminal/host
access model.

- Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT