Re: New Public Review Issue: Proposed Update UTS #18

From: Mark Davis (mark.davis@icu-project.org)
Date: Tue Sep 25 2007 - 12:00:36 CDT

  • Next message: Jon Hanna: "Re: Marks"

    "add as single letter" does not mean "always treat as single letter in
    Regex".

    Mark

    On 9/25/07, Marion Gunn <mgunn@egt.ie> wrote:
    >
    > Tricky? Perhaps so, Mark, but solutions are the name of the game. In
    > any case, we need to add CH as a single letter in both Welsh and
    > Breton, C'H as a single letter in Breton, FF as a single letter in
    > Welsh, NG as a single letter in Welsh, etc., in all implementations.
    > mg
    >
    > On 24 Sep 2007, at 20:52, scríobh Mark Davis:
    >
    > > ...
    > > On the comment on "feasible" -- I think the reference there was to
    > > language/locale-sensitive regex. That involves a few things which
    > > are quite tricky, and are thus listed under Level 3 in UTS#18.
    > > sensitivity: "aa" matches a-ring in Danish
    > > language-sensitive ordering ranges: [a-z] doesn't include o-slash
    > > in Danish
    > > language-sensitive grapheme clusters: a dot matches "ch" in Slovak
    > > ...
    > > Few implementations try to handle locale-sensitivity except for
    > > POSIX (and that has significant problems in it). I wouldn't say
    > > that they are infeasible, but they are tricky.
    >
    > - -
    > Marion Gunn * EGTeo (Estab.1991)
    > 27 Páirc an Fhéithlinn, Baile an
    > Bhóthair, Co. Átha Cliath, Éire.
    > * mgunn@egt.ie * eamonn@egt.ie *
    >
    >
    >
    >

    -- 
    Mark
    


    This archive was generated by hypermail 2.1.5 : Tue Sep 25 2007 - 12:04:04 CDT