Re: Derived age regexp

From: Markus Scherer (
Date: Fri Oct 15 2010 - 18:04:36 CDT

  • Next message: Eric Muller: "Re: Derived age regexp"

    On Fri, Oct 15, 2010 at 3:19 PM, Tim Greenwood <>wrote:

    > Is there any regular expression - in perl, or elsewhere, that enables
    > searching on the derived age? I want to find all characters in a file added
    > since Unicode 4.1.
    > I could write it all by processing against the derived age file, but it
    > would be nice if it is ready to go.

    You could use an ICU UnicodeSet or an ICU regular expression.[[:^Cn:]%26[:^age%3D4.1:]]&abb=on&g=

    A (frozen) UnicodeSet with its span() or spanUTF8() method might suffice,
    depending on what you need.

    We also have dedicated API ( for the non-Unihan

    Note what UTS #18 <> says about [:age:]
    or \p{age} (which ICU implements):

    **Caution:* The
    DerivedAge<> data
    file in the UCD provides the deltas between versions, for compactness.
    However, when using the property all characters included in that version are
    included. Thus\p{age=3.0} includes the letter *a*, which was included in
    Unicode 1.0. To get characters that are new in a particular version,
    subtract off the previous version as described in 1.3 Subtraction and
    For example: [\p{age=3.1} -- \p{age=3.0}]

    Best regards,

    This archive was generated by hypermail 2.1.5 : Fri Oct 15 2010 - 18:06:41 CDT