Re: Unicode lexer

From: Tex Texin (tex@i18nguy.com)
Date: Wed Apr 20 2005 - 17:40:11 CST

  • Next message: Tex Texin: "Re: Unicode lexer"

    Hans, Tom,

    Hans, I can't provide too many details just yet.
    Tom, you are right it is the latter, Unicoded identifiers and such. I'll
    look at the Python docs, thanks for the tip.

    Tom Emerson wrote:
    >
    > Tex Texin writes:
    > > I would be interested in pointers to any papers, case studies etc. on
    > > migrating programming languages to be Unicode-enabled. (No sense
    > > repeating the sins of the past.)
    >
    > I would take a look at Python and the various specifications that were
    > written around its Unicode implementation. The guys who implemented it
    > did a fantastic job. Indeed, the implementation is pretty easy to read
    > as well, so you may just want to look at the code.
    >
    > There are, of course, a couple of levels of "Unicode-enablement"
    > within a programming language. Many moons ago I was involved with
    > working on the Unicode-enablement of Gwydion Dylan, though life
    > intervened and I had to stop. If "all" you need to do is provide
    > support for a Unicode string type, with appropriate transcoders, then
    > the task is considerably easier than if you are enabling the entire
    > language to allow Unicode identifiers, a la Java. Since you are asking
    > for a Unicode enabled lexer, I assume the latter.
    >
    > I thought that Flex had been modified to deal with Unicode... I guess
    > that isn't the case.
    >
    > You don't mention the implementation language: whether it's C, C++,
    > Java, or something else entirely. That will certainly constrain your
    > choices.
    >
    > It may end up being easier to develop your own lexer from scratch, not
    > using Flex or other lexer generator. But again, without knowing more
    > about the problem, it's hard to say. FWIW I've taken this approach in
    > one project, and it worked well, especially given UAX #31 as a
    > starting point.
    >
    > -tree
    >
    > --
    > Tom Emerson Basis Technology Corp.
    > Software Architect http://www.basistech.com
    > "Beware the lollipop of mediocrity: lick it once and you suck forever"

    -- 
    -------------------------------------------------------------
    Tex Texin   cell: +1 781 789 1898   mailto:Tex@XenCraft.com
    Xen Master                          http://www.i18nGuy.com
                             
    XenCraft		            http://www.XenCraft.com
    Making e-Business Work Around the World
    -------------------------------------------------------------
    


    This archive was generated by hypermail 2.1.5 : Wed Apr 20 2005 - 17:41:05 CST