Re: New Public Review Issue: Proposed Update UTS #18

From: Jonathan Coxhead (jonathan@doves.demon.co.uk)
Date: Mon Sep 24 2007 - 05:33:28 CDT

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"

Previous message: Mike: "Re: Unicode Regex Design (was Re: New Public Review Issue: Proposed Update UTS #18)"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mike wrote:

> I played around with the ability to add digraphs to "." and came up
> with two methods. The first would be to specifically list them using
> syntax such as:

I'd just like to point out that a "[ ]" regular expression is defined to
match always exactly one character (if it matches at all).

You can write "[abcdef]" as "(a|b|c|d|e|f)" if you like. You can also write
"(a|bb|ccc|dddd|eeeee|ffffff)", but there is no form using "[ ]" to match the
same thing.

"[ ]" exists primarily as an optimisation, because matching 1 character
against a set is a fast operation, whereas checking against an unknown number of
alternatives of potentially varying lengths ("( | )") is expensive.

So a sequence specified like [^ ] could never match a whole message, or the
string "New York": it could only match a single character.

What exactly this means in the context of Unicode is a different matter, but
I imagine some sort of historical consistency is desirable.

-- 
... Jonathan
    Belmont CA 94002

Next message: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Previous message: Mike: "Re: Unicode Regex Design (was Re: New Public Review Issue: Proposed Update UTS #18)"
In reply to: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Next in thread: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Mike: "Re: New Public Review Issue: Proposed Update UTS #18"
Reply: Philippe Verdy: "RE: New Public Review Issue: Proposed Update UTS #18"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Sep 24 2007 - 05:37:06 CDT