Re: regular expressions with unicode situation?

From: Ben Dougall (bend@freenet.co.uk)
Date: Wed Apr 23 2003 - 08:10:34 EDT

  • Next message: Mark Davis: "Re: alpha, print, graph, blank, etc."

    thanks very much for the info.

    On Tuesday, April 22, 2003, at 09:01 pm, Mark Davis wrote:

    > You might take a look at the Unicode website
    > (http://www.unicode.org/), in
    > particular UTR #18: Unicode Regular Expression Guidelines. If you are
    > looking for Unicode-capable regex implementations, I'd suggest looking
    > at
    > Perl and ICU.

    On Tuesday, April 22, 2003, at 09:29 pm, Addison Phillips [wM] wrote:

    > Hi Ben,
    >
    > Most regex engines can handle Unicode text for the trivial cases, such
    > as
    > exact matching. The problem of creating regex that is useful in a
    > Unicode
    > context (where specifying huge numbers of code points might come into
    > play
    > otherwise or in which you want to use character properties specified
    > by the
    > Unicode Character database) is a non-trivial exercise. The guidelines
    > for
    > implementing Unicode regex are actually a Unicode Technical Report
    > (not part
    > of the standard) which you can find here:
    > http://www.unicode.org/reports/tr18
    > .....



    This archive was generated by hypermail 2.1.5 : Wed Apr 23 2003 - 09:13:00 EDT