Unicode Lookahead in Parsers?

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri Aug 30 1996 - 18:32:44 EDT

Next message: Martin J Duerst: "Re: Letters vs. precomposed characters"
Previous message: Kenneth Whistler: "Where precomposed characters came from"
Next in thread: Keld J|rn Simonsen: "Re: Unicode Lookahead in Parsers?"
Maybe reply: Keld J|rn Simonsen: "Re: Unicode Lookahead in Parsers?"
Maybe reply: David Goldsmith: "Re: Unicode Lookahead in Parsers?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Dan Oscarsson writes in response to Arnt:

>> Putting combining characters before the non-combining character would
>> make such speculative rendering impossible.
>>
>Agreed, for GUI representation it would be ok. But not when havinga a
>command/program parser. Generally, when writing parsers, you want to
>avoid as much look ahead as possible. Unicode forces you to always
>read at least one character ahead.

Nope, not even for parsers is this a problem. With correct extension
of the identifier syntax, there is no additional cost over what
a parser (or more accurately, the lexer portion of the parser)
currently has to do. Once you transition to the state
which is accumulating a token for an identifier, you sit in a loop
of the form:

while ( isIdentifierPart (*s) )
*tk++ = *s++;

The entire trick is in specifying the identifier correctly. The
implementation guidelines published in the Unicode Standard 2.0
include a section which spells out a complete suggested BNF syntax
for identifiers which can be used to generate an efficient one-step
table lookup underneath an isIdentifierPart() implementation.

Check with the Java implementers. They're not complaining about
combining characters causing inefficiencies in the lexer.

--Ken Whistler

Next message: Martin J Duerst: "Re: Letters vs. precomposed characters"
Previous message: Kenneth Whistler: "Where precomposed characters came from"
Next in thread: Keld J|rn Simonsen: "Re: Unicode Lookahead in Parsers?"
Maybe reply: Keld J|rn Simonsen: "Re: Unicode Lookahead in Parsers?"
Maybe reply: David Goldsmith: "Re: Unicode Lookahead in Parsers?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:31 EDT