Re: regular expressions with unicode situation?

From: Mark Davis (
Date: Tue Apr 22 2003 - 16:01:40 EDT

  • Next message: Mark Davis: "Re: Grapheme cluster boundaries and left-side spacing dependent vowels"

    You might take a look at the Unicode website (, in
    particular UTR #18: Unicode Regular Expression Guidelines. If you are
    looking for Unicode-capable regex implementations, I'd suggest looking at
    Perl and ICU.

    (مرقص بن داود)
    IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
    (408) 256-3148
    fax: (408) 256-0799

    ----- Original Message -----
    From: "Ben Dougall" <>
    To: <>
    Sent: Tuesday, April 22, 2003 12:38
    Subject: regular expressions with unicode situation?

    > i'm just wondering if anyone can tell me what the general state of play
    > is at the moment regarding using regular expressions with unicode?
    > i'm not even completely sure if / how the two would fit together
    > completely or successfully? i've used regex in php, which was a version
    > of posix regex, and found it very useful. i'm now doing stuff on a mac
    > - os x (cocoa), and am starting work on an app that will analyses and
    > dissect text and am wondering if i can make use of regular expressions.
    > i want the app to work equally in all languages / character subsets. if
    > regex in general only covers small portions of unicode i don't think
    > it'll be so useful.
    > any general info regarding regex in conjunction with unicode much
    > appreciated. thanks.

    This archive was generated by hypermail 2.1.5 : Tue Apr 22 2003 - 16:33:25 EDT