Re: Matching Unicode strings and combining characters [was: basic

From: Geoffrey Waigh (anzu@home.com)
Date: Thu Sep 30 1999 - 14:41:02 EDT


Paul Keinanen wrote:
>
> On Thu, 30 Sep 1999 07:00:17 -0700 (PDT), "Reynolds, Gregg"
> <greynolds@datalogics.com> wrote:
>
> >I don't mean to be nasty (there are other threads for that ;) ), but this
> >subject has come up several times and for the life of me I can't see what's
> >so difficult about it.
>
> There is no problem as long as all the data is available at once as is
> the case with a disk file or when there is a low level protocol that
> guaranties that a complete abstract character (base+all combining
> marks) are delivered in one atomic unit e.g. in a UDP frame.
>
> The problem starts when the bytes are not delivered as atomic units
> e.g. in asynchronous serial lines or TCP/IP.
 
This problem goes away as soon as you add framing to your data. If you
are trying to send text strings rather than bytes you need to indicate
what constitutes a complete string. Many TCP/IP based protocols already
do this. Several families of terminals work this way as well. Indeed
if you use a terminal where you describe fields in terms of position and
length (not in 'character cell' terms) you suddenly become free to have
proportional/composing writing systems without destroying your screen
layout.

> So Unicode is not supported in a real time environment ?

I did it several years ago. It is much easier when you accept that most
of the complexity in Unicode is there for good technical reasons and to
work with it rather than twist it to fit old ASCII programming practices.

Geoffrey



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT