RE: A basic question on encoding Latin characters

From: Kenneth Whistler (kenw@sybase.com)
Date: Tue Sep 28 1999 - 18:02:29 EDT


Robert snapped back:

>
> On Tue, 28 Sep 1999, Kenneth Whistler wrote:
>
> > Once again, this would be a *protocol* error. If the communication protocol
> > is waiting for "xxxxá", then it should act when it receives the final
> > "á" as a unit, or if it has received an "a", then it should act when it
> > receives the final combining acute accent. And ordinarily the communication
> > protocol should specify a normalized form, so it doesn't have to deal
> > with alternative forms as equivalent for these purposes.
>
> You miss the point entirely when talking about "login:", so spectularly it
> is almost funny. I find it hard to believe that you have never used a
> TTY-based system, and never used a telnet client!
>

Discussion of telnet basics omitted.

>
> The terminal emulator will obtain "login:" from the program, and will
> display it immediately. So far, so good.
>
> If the terminal read Hello<double-exclamation-mark>, encoded in UTF-8,
> this would also work, because of the properties of UTF-8.
>
> But, if a terminal read merely Hello, and renders this, due to a
> delay, the final <combining-paperhat-above>, what is it to do?
>
> The only option is to go back and rerender the "o", only with a paper hat
> above. This causes unpleasant flicker, which wouldn't happen if combining
> characters were placed before the base character, as it would be possible
> to tell with zero-lookahead when a combining sequence ends.

If this is the "problem" you are concerned about, then please go back and
read some of the other responses on this thread, from Mark Davis and
in particular François Yergeau:

"There is no good reason for the terminal not to print the final character
when received. If a combining character comes later, the terminal simply
has to redisplay the combination over the previous glyph. This is what our
Arabic terminals and emulators have been doing for years (e.g. receive an
Arabic letter and display it in final form; receive another letter,
redisplay the previous one in middle form and the new one in final form)."
    --Yergeau

You have to do this for any complex script handled in a terminal
emulation. Too bad it turns out that the Latin script is also a
complex script.

But this is mostly beside the point, since terminal emulation protocols
can simply declare that they will support only data in Normalization
Form C and limit their subset to what their fonts can handle, and they
don't have to change anything except their encoding support.

The strawman I was addressing was Frank's implication that some
processes would fail in this realm because they would sit and hang
waiting forever for the combining mark that never showed up for dinner,
or that you would get false negative and false positive matches that
couldn't be programmed around because of the placement of combining
marks. -- Not the redisplay flicker issue on a terminal when rendering
trailing combining marks.

If this were just the issue of prompts, do you think a user who is
expecting a prompt ending in "á" is going to wait around for the host
to deliver the combining acute accent when they see that "a" sitting on the
screen? Of course not. They'll just assume the programmer couldn't spell
right and will type in their response. If the host blows up in response,
or delivers a message "Wait! I haven't finished posting the prompt yet."
then they will assume--correctly--that the software is broken.

In any case, I don't see a Unicode *conformance* issue here, which is
where Frank started this thread.

>
> And lose the contemptuous attitude to terminals please. I do almost all my
> work in xterms. They are not obselete, regardless of your wishes.

I have never said on this list that xterms or other terminal emulators were
obsolete. I use them myself all the time. Such words might have passed
my lips regarding "terminals" as pieces of equipment, since it is generally
accurate that they are obsolescent and are disappearing into marginal
nooks and crannies. I think the last time I actually physically placed
my fingers on a *real* terminal to talk to a UNIX machine was 15 years ago.

Some on the list may have corporate agendas that would be furthered
by stamping out xterms and that whole approach to computing, as
Frank implied. I am not one of them. I see the value of xterms and
their need. But I just want the developers to get on with the business
of providing proper Unicode support in them and to stop whining about
combining marks. If the character handling model limits what you can
support, then own up to the limits and don't complain when someone invents
a different protocol to extract data out of hosts and display it correctly.

If anything puts my knickers in a twist, it is the implication by
some that it is the architecture of *Unicode* that is causing all
this problem for terminal emulation support of the languages of
the world. Man, the utter temerity of all those scribes and typographers
over the last three millenia, not to forsee the problem and keep their
writing systems simple, small, and linear, so that they would not
cause problems in extending ASCII-based computer protocols.

--Ken

>
> --
> Robert
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT