Re: A basic question on encoding Latin characters

From: Mark E. Davis (markdavis@ispchannel.com)
Date: Tue Sep 28 1999 - 12:07:41 EDT


I geni s'incontrano!

Mark

Marco.Cimarosti@icl.com wrote:

> I am not sure if I understood very well, but seems to me that you are basing
> your observation on the very peculiar behavior of your application.
>
> I understand that your hypotetical terminal software is trying to render
> Unicode text as soon as it arrivers, CHARACTER BY CHARACTER.
>
> The font used by your terminal, I understand, has no combining characters.
> So, each time it receives an e (say) it has to wait the next character to
> see if it is a combining ^ (say) because, in this case, the two character
> sequence would be converted to ê.
>
> On a global perspective, this is not the major problem that I see in your
> design: trying to render Unicode text on a per character basis would NEVER
> work with many other features of Unicode.
>
> For instance, imagine that you receive text in Arabic or in an Indic script.
> Because of the way these alphabets are specified in Unicode, you need to to
> have a whole "block" (i.e. line or paragraph) of text before you have all
> the information you need for the complex processes of bidirectional
> reordering, Indic reordering, and context shaping required for these writing
> systems.
>
> But there is no need of exotic alphabets or combining accents to screw up
> your design: sticking to good old ASCII, what would your modem script do if
> the prompt "login:" was translated in the Italian "codice d'accesso:"?
> It would wait, I think, until the Italian government changes the
> constitution to drop Italian and adopt English as the official language.
>
> If such a medieval design cannot be avoided because of technical
> constraints, it would be wiser, in my mind to do one of the following:
>
> - support Unicode only after login;
>
> - limit Unicode support in the login phase to the ASCII range (U+0000 to
> U+007F) or, at best, to Latin 1 (U+0000 to U+00FF), and not even try to
> implement relatively complex things as combining accents;
>
> - impose that the prompt and the answer be on separate lines: in this case,
> the line terminator character(s) would act as the "higher level protocol" to
> signal "ok, I'm finished transmitting, now it's your turn" that you
> suggested;
>
> - re-ingeneer entirely the login and terminal software using more up-to-date
> techniques.
>
> Regards.
> Marco Cimarosti
>
> > -----Original Message-----
> > From: Frank da Cruz [SMTP:fdc@watsun.cc.columbia.edu]
> > Sent: 1999 September 28, Tuesday 16.10
> > To: Unicode List
> > Cc: unicode@unicode.org
> > Subject: Re: A basic question on encoding Latin characters
> >
> > > Um, at that time the normalization hadn't been done. So at that time
> > there
> > > weren't _technical_ reasons for drawing a line at the normalization
> > > border. The line was drawn after that time. It could have been
> > > before. But it has been drawn and there had better be really good
> > reasons
> > > offered if we are not to respect it.
> > >
> > In interactive telecommunications, we have the following situation:
> >
> > 1. Host sends "login:" (or any other prompt).
> > 2. User is supposed to type her ID (or any other response).
> >
> > When using Unicode, the terminal emulator may not print the final
> > character
> > of the prompt because it doesn't know yet whether any combining characters
> > will follow. So the user doesn't know whether the host is ready to
> > receive
> > a response and therefore should not reply since in some cases (e.g. at the
> > UNIX "Password:" prompt) an early response is discarded.
> >
> > If the process is being executed by a script, the script sits and waits;
> > "waitfor 'login:'" will not succeed, since it can not be known whether
> > 'login:' has arrived until the next base character after ':' comes, but no
> > such character is coming (I realize it is silly to expect a colon to have
> > an
> > accent but those are the rules -- and not all prompts end with colon).
> >
> > There is no escape from this situation other than introduction of a
> > "higher
> > level protocol" to signal "ok, I'm finished transmitting, now it's your
> > turn", just like in the old half-duplex days.
> >
> > This is the kind of reason that telecommunications-oriented applications
> > seem to be steering away from the Normalization Form D model, however
> > appropriate it might be in other areas, and embracing Normalization Form C
> > (ISO 10646 Level 1) and, by extension, precomposed characters, as we have
> > seen in Plan 9 and now, it seems, Linux. I don't think this indicates
> > recalcitrance or West European bias in UNIX culture as much as a desire to
> > preserve telecommunications and the terminal/host model as a viable
> > interface between human and machine in the Unicode age, as it has been
> > since
> > beginning of the computer age. I also think it's no accident that Unicode
> > is best supported on those platforms that have eschewed the terminal/host
> > access model.
> >
> > - Frank



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT