Re: Unicode lexer

From: Tex Texin (tex@i18nguy.com)
Date: Wed Apr 20 2005 - 03:05:38 CST

  • Next message: Tex Texin: "Re: Unicode lexer"

    Thanks for the replies to my question on Unicode-enabled lexers. Here is
    my compiled list.
    The advice is:

    1) Patrick Andries: Javacc can handle Unicode and has a lexer integrated
    into it, but it
    also includes a syntax parser.

    https://javacc.dev.java.net/doc/features.html

    2) Hans Aberg posted in the Flex list
         List-Archive: <http://lists.gnu.org/pipermail/help-flex>
     Haskell code that admits one to generate Flex-like regular
    expressions from Unicode character number classes, in a way that the
    generated lexer parses your choice of UTF-8 or UTF-32 (big or little
    endian). So you might be able to use Flex or some similar lexer
    generator by entering those regular expressions by hand into the
    lexer source file.

    3) Gregg Reynolds:
       
       http://jflex.de/
       https://javacc.dev.java.net/

    4) Frank Tang:
    XSFT is Unicode enabled already
    http://www.stanford.edu/~laurik/fsmbook/home.html

    5) I also found a thread on this list in January 2005 that claimed:

    many lexer/scanner projects available in SourceForge.net. Many of them
    support Unicode. See for example the results page, when searching for
    "lexer" in the SourceForge "software/group" category: See also the
    various references they contain for other similar open projects or
    commercial products.

    -- 
    -------------------------------------------------------------
    Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
    Xen Master                          http://www.i18nGuy.com
                             
    XenCraft		            http://www.XenCraft.com
    Making e-Business Work Around the World
    -------------------------------------------------------------
    


    This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 03:07:38 CST