Dan Oscarsson writes in response to Arnt:
>> Putting combining characters before the non-combining character would
>> make such speculative rendering impossible.
>Agreed, for GUI representation it would be ok. But not when havinga a
>command/program parser. Generally, when writing parsers, you want to
>avoid as much look ahead as possible. Unicode forces you to always
>read at least one character ahead.
Nope, not even for parsers is this a problem. With correct extension
of the identifier syntax, there is no additional cost over what
a parser (or more accurately, the lexer portion of the parser)
currently has to do. Once you transition to the state
which is accumulating a token for an identifier, you sit in a loop
of the form:
while ( isIdentifierPart (*s) )
*tk++ = *s++;
The entire trick is in specifying the identifier correctly. The
implementation guidelines published in the Unicode Standard 2.0
include a section which spells out a complete suggested BNF syntax
for identifiers which can be used to generate an efficient one-step
table lookup underneath an isIdentifierPart() implementation.
Check with the Java implementers. They're not complaining about
combining characters causing inefficiencies in the lexer.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT