Re: New Public Review Issue: Proposed Update UTS #18

From: Asmus Freytag ([email protected])
Date: Tue Oct 02 2007 - 11:07:37 CST

Next message: Philippe Verdy: "RE: Re[2]: marks (2 new symbols)"

Previous message: Philippe Verdy: "RE: Proposal for additional syntax (was Re: New Public Review Issue: Proposed Update UTS #18)"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Michael Maxwell: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Michael Maxwell: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 10/1/2007 9:57 AM, Mike wrote:
>>> I think it's a bad idea for \q to have the side
>>> effect of changing the meaning of ".".
>>
>> Well if you don't do that, then [^set\q{ch}] becomes inconsistent and
>> does
>> not return the user-expected result, i.e. the exact complement of what
>> [set\q{sh}] matches, according to ".".
>
> No, there is no inconsistency. When my compiler encounters a
> character class, it creates a new matcher object for it; it
> doesn't use the "." matcher (a predefined object).
>
>> [...] as soon as you are introducing collation elements
>> in regexps, these are sorted by collation, and collations are
>> locale-sensitive...
>
> I don't see why they need to be sorted. All that matters is
> that you find the longest match. [a-z\q{ch}] will match "ch"
> in "chinchilla" rather than just "c".
>
>> In addition, the meaning of ranges in sets like [a-z] should also be
>> consistant with the collation used...
>
> I disagree with this. I think that having [a-z] magically
> mean all characters in a particular language is asking for
> trouble. In French, would you say that [a-z] should match
> C WITH CEDILLA or A + ACUTE?
Having that kind of support allows regexes to be written that match, say
the top half of a list
by using [a-k] etc. That's something that you can do in English today,
but not in any other
language. You need to decide whether extending regexs to other languages
should allow
such uses (in which case you think of collation elements and sorting
order) or not.

Depending on how many accented letters a language uses, writing the
equivalent expression manually can be both tedious and error-prone.

BTW, in Swedish, for example [a-z] would not match all letters. since a
with ring, a with dieresis and o with dieresis would sort after z. So,
it's not a question of making [a-z] magic, but whether the elements in
[ ] are character codes or collation elements.
>
> It's my opinion that ranges inside [] should be simple binary
> order. If you want to do anything fancier, there should be
> new syntax for it.
That, or an option?

Now, other than for canonical decompositions (and conjoining Jamo), I've
not seen an example that informs me of why it is useful for a regex
package to be able to match 'ch' as if it were a single code point. Can
somebody please present a simple example that shows an important use
case that can't be realized if regexes are limited to a single character
(plus *canonical* equivalents).

After all, the atomic elements for writing would be the 'c' and 'h', it
is only for the purpose of some other text operations that 'ch' are
(sometimes) considered a unit.

Next message: Philippe Verdy: "RE: Re[2]: marks (2 new symbols)"
Previous message: Philippe Verdy: "RE: Proposal for additional syntax (was Re: New Public Review Issue: Proposed Update UTS #18)"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Michael Maxwell: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Michael Maxwell: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Oct 02 2007 - 11:10:19 CST