Re: Unicode lexer

From: Frank Yung-Fong Tang (
Date: Wed Apr 20 2005 - 08:23:57 CST

  • Next message: Peter Constable: "RE: Unicode Bloopers"

    I think one question we need to first answer is how do you define an

    Unicode Enabled Lexer

    I don't have a good answer. But I think it should at least include the

    1. Have the ability to scane UTF-8 (and/or UTF-16) input file
    2. Have the ability to return token in one or more transformation format of
    3. Have the ability to handle some set of Unicode regular expression
    4. Have the ability to support programming language specific Unicode
    'escape' sequence. ( \uHHHH, &#ddddd; &#xxxxx; \HHHHH , etc) The lexer may
    not support it directly, but it should be able to let the Lexer caller to
    define a way to deal with it.
    5. Use some Unicode based String data type as primitive datatype to return
    the result in the token.[?]

    Frank Yung-Fong Tang
    Šýšţém Årçĥîţéçţ

    This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 08:26:16 CST