Re: 32'nd bit & UTF-8

From: Hans Aberg (
Date: Wed Jan 19 2005 - 17:51:30 CST

  • Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

    At 19:35 +0100 2005/01/19, Philippe VERDY wrote:
    >> De : "Arcane Jill"
    >> As a programmer myself, I actually followed that explanation. But I wonder if
    >> it's the right approach. Would it not be a more ... interesting ... approach,
    >> to forget Flex, and instead write a brand new Unicode lexer generator which
    >> generates a lexer that processes characters (not bytes)?
    >Why not JFlex, a free GPL-licenced lexer on SourceForge?
    >See <> for the documentation, download, and access to its
    >Yes it's not a direct replacement, because it is written in Java for Java, but
    >this is still a base to generate lexers that will compile with C++. Also it has
    >full Unicode support. The bad thing is its current limitation to 64K DFA states

    There is a "Unicode" version of Flex, using a 16-bit wchar_t. This then
    results in using 2^16 arrays for lookup tables. So this does not help the
    implementation full Unicode range.

    > (but this could be patched by changing the internal representation for these

    This table compression is what one would want to avoid. Therefore I started
    to think about the regular expression method.

      Hans Aberg

    This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 17:53:14 CST