Re: New Public Review Issue: Proposed Update UTS #18

From: Mike (mike-list@pobox.com)
Date: Mon Oct 01 2007 - 10:57:13 CST

  • Next message: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"

    >> I think it's a bad idea for \q to have the side
    >> effect of changing the meaning of ".".
    >
    > Well if you don't do that, then [^set\q{ch}] becomes inconsistent and does
    > not return the user-expected result, i.e. the exact complement of what
    > [set\q{sh}] matches, according to ".".

    No, there is no inconsistency. When my compiler encounters a
    character class, it creates a new matcher object for it; it
    doesn't use the "." matcher (a predefined object).

    > [...] as soon as you are introducing collation elements
    > in regexps, these are sorted by collation, and collations are
    > locale-sensitive...

    I don't see why they need to be sorted. All that matters is
    that you find the longest match. [a-z\q{ch}] will match "ch"
    in "chinchilla" rather than just "c".

    > In addition, the meaning of ranges in sets like [a-z] should also be
    > consistant with the collation used...

    I disagree with this. I think that having [a-z] magically
    mean all characters in a particular language is asking for
    trouble. In French, would you say that [a-z] should match
    C WITH CEDILLA or A + ACUTE?

    It's my opinion that ranges inside [] should be simple binary
    order. If you want to do anything fancier, there should be
    new syntax for it.

    Mike



    This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 11:00:45 CST