Re: A basic question on encoding Latin characters

From: Mark E. Davis (
Date: Wed Sep 29 1999 - 00:18:23 EDT

Perhaps you could explain a bit more. Your environment is probably enough
different that I am finding it a bit tricky to understand the constraints you are
working under.

In the terminal systems I worked with (ages ago, so my memory is haze), you always
took some action to indicate that your input was complete: tab to get out of a
field, enter to finish the record. Is your environment different?

In the course of the discussion, you shifted a bit from keyboard input to scripts.
I have a followup question about that.
Suppose that a script in your environment is supposed to recognize "Smit", but is
not supposed to recognize "Smith".
How is this any different from the script that is supposed to recognize "Sma", but
is not supposed to recognize "Sma<ring>" (e.g. Små)?


Frank da Cruz wrote:

> Ken wrote:
> > Sounds like a purty flimsy strawman to me.
> >
> It might well be.
> > > But if no more characters are coming
> > > (e.g. until there is some kind of response) then it would be [a match],
> > > but how can the script know?
> >
> > By the EOL or other end-of-content marking built into the protocol.
> >
> But there is no protocol. Most prompts do not end with an EOL.
> A script is by nature an attempt to codify human behavior in a
> stimulus-response situation. The stimuli are designed for people, not
> protocols, and in any case are usually not changeable (maybe you can change
> them, but as soon as you do be prepared for screams of agony to go up from
> the masses who, unbeknownst to you, depend for the livelihood on the prompts
> not changing). Thus the script must adapt to whatever is on the other end
> of the connection.
> If the prompt is "login:" with no EOL, we can't force an EOL to come; ditto
> for other dialog situations in which the prompt more likely to end with some
> character that might reasonably be followed by a combining character (or not).
> > ... ordinarily the communication
> > protocol should specify a normalized form, so it doesn't have to deal
> > with alternative forms as equivalent for these purposes.
> >
> I believe this is what telecommunications-oriented platforms and/or
> applications are doing when they avoid the issue of combining forms by saying
> they don't support them.
> > ... as the Unicoders have continually pointed out, Implementation Level 1
> > is a crutch for brain-damaged implementations that cannot handle anything
> > complex. It rules out support for all of the complex scripts of the world.
> >
> Meaning Indic, Arabic, etc... Of course this is true, and yet Level 1 exists
> and developers will use it. We have in UTF-8 a vigorous attempt to embrace
> the "legacy" terminal/host world and existing applications to promote easy
> migration from ASCII to Unicode (and somewhat less easy from 8-bit character
> sets). But these very platforms are accessed in a simple and open manner
> which does not mesh well with complex scripts.
> We might wish to wipe away the legacy of fifty years of computing and start
> over (in more ways than one!) but I fear there will never be a replacement
> for the simple and open terminal/host access method that will support
> complex scripts and still be as open and vendor-neutral as the terminal/host
> model. We are suffering already from the lack of open (e.g. Telnet) access
> to Macintosh and Windows platforms.
> I'm not saying I know what to do, only that "throw away your medieval tools
> and enter the modern age" is as likely to result in a new Tower of Babel as
> it is to promote universal communication. But this time the Babel is not in
> character sets but in the profusion of ever-changing and incompatible
> vendor- and application-specific protocols and data formats.
> Perhaps it's all a tempest in a teapot. For some time to come we will have
> all possible combinations of "legacy" and Unicode-aware hosts and clients,
> and we have to allow for each combination. Different problems will come up
> in each configuation, and we'll see how to deal with them. My hope is that
> it will not be by inventing a neverending stream of Three-Letter Acronyms to
> "comply" with, on top of Unicode itself, just to get text from point A to
> point B. If you thought you hated ISO 2022, just think of the standards
> nightmare that will grow out of that!
> - Frank

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT