Re: A basic question on encoding Latin characters

From: Mark E. Davis (
Date: Tue Sep 28 1999 - 11:36:34 EDT

We should make it very clear that Normalization Form C does *not* eliminate
combining characters. It does precompose them where possible, but for many
scripts and characters it is not possible, or desireable.

Exactly the same problem that you discuss occurs with any script that requires
shaping. When I type an Arabic character, the previous character needs to change
shape. What the terminal needs to do is replace the glyph on the screen with a
different form. As I recall from my terminal days, the controls for doing this
are available. The same technique can be used for accents. Type an A, see an A.
Then type an umlaut, and the host picks it up, decides that it needs a composed
presentation form, and replaces the A by on the screen. Of course, the display
on the terminal still depends on the ''font" that it has, which may or may not
allow dynamic composition, but fundamentally I don't see the problem.


Frank da Cruz wrote:

> > Um, at that time the normalization hadn't been done. So at that time there
> > weren't _technical_ reasons for drawing a line at the normalization
> > border. The line was drawn after that time. It could have been
> > before. But it has been drawn and there had better be really good reasons
> > offered if we are not to respect it.
> >
> In interactive telecommunications, we have the following situation:
> 1. Host sends "login:" (or any other prompt).
> 2. User is supposed to type her ID (or any other response).
> When using Unicode, the terminal emulator may not print the final character
> of the prompt because it doesn't know yet whether any combining characters
> will follow. So the user doesn't know whether the host is ready to receive
> a response and therefore should not reply since in some cases (e.g. at the
> UNIX "Password:" prompt) an early response is discarded.
> If the process is being executed by a script, the script sits and waits;
> "waitfor 'login:'" will not succeed, since it can not be known whether
> 'login:' has arrived until the next base character after ':' comes, but no
> such character is coming (I realize it is silly to expect a colon to have an
> accent but those are the rules -- and not all prompts end with colon).
> There is no escape from this situation other than introduction of a "higher
> level protocol" to signal "ok, I'm finished transmitting, now it's your
> turn", just like in the old half-duplex days.
> This is the kind of reason that telecommunications-oriented applications
> seem to be steering away from the Normalization Form D model, however
> appropriate it might be in other areas, and embracing Normalization Form C
> (ISO 10646 Level 1) and, by extension, precomposed characters, as we have
> seen in Plan 9 and now, it seems, Linux. I don't think this indicates
> recalcitrance or West European bias in UNIX culture as much as a desire to
> preserve telecommunications and the terminal/host model as a viable
> interface between human and machine in the Unicode age, as it has been since
> beginning of the computer age. I also think it's no accident that Unicode
> is best supported on those platforms that have eschewed the terminal/host
> access model.
> - Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT