Re: A versus A-thing & lookahead...

From: Mark Leisher (
Date: Thu Aug 29 1996 - 18:01:19 EDT

    Michael> At 14:47 1996-08-28, unicode@Unicode.ORG wrote:
>>> Yes, but it's not a once-off look, is it? Because you can stack
>>> combining characters. So you know it's not an A, but you have to keep
>>> looking and looking and looking, don't you? Doesn't this make
>>> processing much more complex than Level 1 processing?
>> Complex for WHAT? What are you trying to do?

    Michael> Sorting and double-click word selection, for instance. Global
    Michael> recognition for find-and-replace.

I can say from experience that consistent internal representation makes these
concerns almost irrelevant. Take, for instance, a search operation. A
consistent internal representation means that if a document contains the
string you are looking for, then the search string (also in the same
consistent internal representation) will be likely to match, with no (or very
little) additional processing other than the search itself.

>> Iteration is simply iteration... Computers do it better than we do, so
>> what if you have to keep iterating on something? Infinite lookahead is
>> not "difficult" and it doesn't
>> make anything more complex; it just takes the machine more
>> iterations...

    Michael> No loss of processing efficiency?

There is a small loss of efficiency, but if implemented correctly, has almost
negligible impact. I have not actually done an analysis, but my intuition is
that the majority of cases will probably have at most 4 combining characters,
with the worst case being that interesting Arabic ligature combined from
around 19 characters.

Four combining characters do not represent much of an overhead with a good
composition table implementation (e.g. some form of tree).
Mark Leisher "A designer knows he has achieved perfection
Computing Research Lab not when there is nothing left to add, but
New Mexico State University when there is nothing left to take away."
Box 30001, Dept. 3CRL -- Antoine de Saint-Exup'ery
Las Cruces, NM 88003

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT