RE: RE: A basic question on encoding Latin characters

From: Kevin Bracey (kevin.bracey@pacemicro.com)
Date: Thu Sep 30 1999 - 04:57:25 EDT


In message <199909291913.MAA20938@unicode.org>
          Marco.Cimarosti@icl.com wrote:

> I keep not understanding where the problem is with these terminals and
> combining characters.
>
> The fact is that the polling (receiving) of BINARY characters (encoded in
> Unicode, ASCII, GB, or whatever) and their visualization should be two
> completely separate and unrelated things.
>
> The part that receives BINARY characters (from remote or local) should not
> even imagine the existence of things like combining characters, or the bidi
> algorithm, or canonical or eretical (de)composition.
>

The basic problem as I see it is that composite sequences
(eg A+combining acute) are to be considered equivalent to precombined
characters. Thus one is no longer looking for individual characters, one
is looking for combining character sequences, and combining character
sequences have no end point.

The example in hand is an automated engine talking to an interactive
server, waiting for the "login:" prompt. If ":-acute" is to be treated
as different from ":", then you mustn't match "login<:-acute>". Because
a combining character can technically follow any other character, we get
an ambiguity problem in any interactive string matching circumstance.

Another circumstance which exhibits this problem is HTML. "<" marks the
start of a markup tag. However, "<+combining /" (canonically equivalent to
U+226C), or any other form of "<" doesn't, so a Unicode capable browser must
presumably take only a "<" followed by a non-combining character as the start
of a tag.

This latter case can be dealt with fairly straightforwardly. In the
interactive case, either there is going to have to be some sort of control
character as a record marker, or in the "login:" case one just accepts
"login:+any combining characters" as a valid prompt, so you wait for "login:"
and then discard any following combining marks. (And also remember to
accept any precomposed accented colons...)

-- 
Kevin Bracey, Senior Software Engineer
Pace Micro Technology plc                     Tel: +44 (0) 1223 518566
645 Newmarket Road                            Fax: +44 (0) 1223 518526
Cambridge, CB5 8PB, United Kingdom            WWW: http://www.acorn.co.uk/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT