RE: New Public Review Issue: Proposed Update UTS #18

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Oct 01 2007 - 09:40:41 CST

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"

Previous message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> -----Message d'origine-----
> De : unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] De la
> part de Mike
> Envoyé : lundi 1 octobre 2007 15:59
> À : 'Unicode'
> Objet : Re: New Public Review Issue: Proposed Update UTS #18
>
> > But note that with my notation /\q{ch}./ would NOT be equivalent to
> /ch./
> > - the latter regexp will match only 3 characters: /c/ followed by /h/
> > followed by what /./ matches by default (i.e. [\u0000-\u10FFF] minus the
> set
> > of line terminators, which depends on the single line or multi-line mode
> in
> > effect, and that I'll note \R).
> > - the former regexp extends the input universe (matched by ".") by
> making it
> > [\u0000-\10FFF\q{ch}] (so that it now contains /c/ or /h/ or the
> sequence
> > /ch/).
>
> I'll say it again. I think it's a bad idea for \q to have the side
> effect of changing the meaning of ".".

Well if you don't do that, then [^set\q{ch}] becomes inconsistent and does
not return the user-expected result, i.e. the exact complement of what
[set\q{sh}] matches, according to ".".

> > For example to match all 3 letters words in Spanish between c and d
> > (inclusive, but "c" and "d" won't match because they are not 3 letters)
> one
> > would use /(?locale=es:(?range:c:d:...))/
>
> This seems to be way beyond what I think regular expressions are for.
> Maybe you should create a little text matching language....

I did propose it, because as soon as you are introducing collation elements
in regexps, these are sorted by collation, and collations are
locale-sensitive...

In addition, the meaning of ranges in sets like [a-z] should also be
consistant with the collation used...

Now if a regexp is locale-sensitive (due to collation), there should be a
way to create a regexp that is not, i.e. to specify explicitly the locale in
use.

If you don't want local-sensitive regexps, you have to specify that your
implementation supports only one locale, and the simplest locale is the "C"
locale with binary order of every collation element!

All this is needed for consistency!

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Previous message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 09:45:53 CST