Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Wed Jan 19 2005 - 17:51:30 CST

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

Previous message: Hans Aberg: "Re: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Maybe in reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 19:35 +0100 2005/01/19, Philippe VERDY wrote:
>> De : "Arcane Jill"
>> As a programmer myself, I actually followed that explanation. But I wonder if
>> it's the right approach. Would it not be a more ... interesting ... approach,
>> to forget Flex, and instead write a brand new Unicode lexer generator which
>> generates a lexer that processes characters (not bytes)?
>
>Why not JFlex, a free GPL-licenced lexer on SourceForge?
>See <http://jflex.de/> for the documentation, download, and access to its
development.
>
>Yes it's not a direct replacement, because it is written in Java for Java, but
>this is still a base to generate lexers that will compile with C++. Also it has
>full Unicode support. The bad thing is its current limitation to 64K DFA states

There is a "Unicode" version of Flex, using a 16-bit wchar_t. This then
results in using 2^16 arrays for lookup tables. So this does not help the
implementation full Unicode range.

> (but this could be patched by changing the internal representation for these
tables)

This table compression is what one would want to avoid. Therefore I started
to think about the regular expression method.

Hans Aberg

Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"
Previous message: Hans Aberg: "Re: UTF-8 'BOM' (was RE: Subject: Re: 32'nd bit & UTF-8)"
Maybe in reply to: Hans Aberg: "32'nd bit & UTF-8"
Next in thread: Kenneth Whistler: "Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Maybe reply: Philippe VERDY: "Re: Re: 32'nd bit & UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Jan 19 2005 - 17:53:14 CST