Re: A basic question on encoding Latin characters

From: Paul Keinanen (keinanen@sci.fi)
Date: Wed Sep 29 1999 - 03:12:48 EDT


On Tue, 28 Sep 1999 08:37:14 -0700 (PDT) Francois Yergeau wrote:

>À 07:10 1999-09-28 -0700, Frank da Cruz a écrit :
>>In interactive telecommunications, we have the following situation:
>>
>> 1. Host sends "login:" (or any other prompt).
>> 2. User is supposed to type her ID (or any other response).
>>
>>When using Unicode, the terminal emulator may not print the final character
>>of the prompt because it doesn't know yet whether any combining characters
>>will follow.
>
>There is no good reason for the terminal not to print the final character
>when received. If a combining character comes later, the terminal simply
>has to redisplay the combination over the previous glyph.

There is one situation in which this does not work.

Assume you have a hardcopy terminal with a print wheel. Admittedly a
rare situation these days, but I would not be surprised to find some
applications in which a hardcopy of the interaction with a system has
to be in hardcopy and the listings have to be stored for up to 5 or 10
years.

If the print wheel has both the base character and base
character+specific combining mark (non precomposed character
available) as separate glyphs, how should this be handled. This might
happen with some newly encoded language that does not get new
precomposed characters. Erasing the base character glyph from the
paper is not an option, when the combining mark arrives some time
later after some unpredictable delay especially in asynchronous
connections (in particular with statistical multiplexors) or in TCP/IP
pipes (in which the base character and the combining mark might be
transmitted in separate low level IP frames with unpredictable delay
in between). Thus setting some arbitrary timeout after the base
character has been received before the print operation starts is out,
since it would make remote echo operation intolerably slow.

Using the combing mark as a separate printable glyph and backspacing
and overprinting the base character is not a good option, since the
base character glyph that might receive a specific combining mark in
this language would have to be reduced in size to accept the possible
combining mark. In order to get uniform texts all other glyphs would
have to be reduced, even if they are not expected to receive any
combining marks i this language.
 
There would not have been any such problems if the combining marks
would have arrived _before_ the base character, in which case the
combining marks would simply be buffered and the glyph selected, when
finally the base character arrives. Any extra telecommunication delays
would be suffered in remote echo mode only with combined characters,
not with all base characters. Unfortunately Unicode puts the combining
marks after the base character so this does not work.

The only solution I can think to the original problem with prompts is
to a add a dummy character (such as U+0000 or U+FEFF) that can not be
followed by combining marks at the end of the prompt string.

Paul Keinänen
 



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT