Re: Unicode lexer

From: Hans Aberg (haberg@math.su.se)
Date: Wed Apr 20 2005 - 18:55:09 CST

Next message: Peter R. Mueller-Roemer: "Re: Unicode Bloopers"

Previous message: Hans Aberg: "Re: Unicode lexer"
In reply to: Tom Emerson: "Re: Unicode lexer"
Next in thread: Hans Aberg: "Re: Unicode lexer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

At 20:27 -0400 2005/04/20, Tom Emerson wrote:
>UTF-8 is a solution to the problem, though the depth of the automata
>increases and you may end up having to convert your existing UTF-16/32
>buffers to UTF-8 for lexing, then back again, dealing all the while
>with returning correct offsets during error processing. PCRE, for
>example, works in UTF-8, so if you want to use it on a UTF-16 buffer
>you need to convert both ways. A RPITA.

There is no problem using UTF-16/32 directly either, as they merely
will be interpreted as byte sequences. UTF-16 is quite irregular, and
is harder to use because of that. So a translator to UTF-8/32 is
probably to prefer. Then, UTF-8 will probably win over UTF-32, as it
has ASCII in its single bytes low 7 bits.

-- 
   Hans Aberg

Next message: Peter R. Mueller-Roemer: "Re: Unicode Bloopers"
Previous message: Hans Aberg: "Re: Unicode lexer"
In reply to: Tom Emerson: "Re: Unicode lexer"
Next in thread: Hans Aberg: "Re: Unicode lexer"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 19:01:34 CST