Re: Proposal for additional syntax (was Re: New Public Review Issue: Proposed Update UTS #18)

From: Mark Davis (
Date: Tue Oct 02 2007 - 11:24:01 CST

  • Next message: Asmus Freytag: "Re: Fish (was Re: Marks)"

    On 10/2/07, Philippe Verdy <> wrote:
    > Mark Davis Wrote:
    > > Also, there were some interesting suggestions for syntax additions
    > > that may be worth mentioning in informative text.
    > > 1. not equals
    > > As well as
    > > * \P{propname=value} and [:^propname=value:]
    > > to have:
    > > * \{propname!=value}, \p{propname‚Ȇvalue}
    > > * [:propname!=value:], [:propname‚Ȇvalue:]
    > I'm not sure that \{propname!=value} should be defined, or recommended or

    \{ was a typo, should have been \p{
    > Also you propose mixing \p and \P for similar use. The only good
    > suggestion
    > is the way to represent the "different" relation using an alternate
    > operator
    > replacing the equal sign, instead of using a leading negation (using a
    > capital \P instead of \p, or a leading ^ operator in a class notation)
    > before the encoded equality.

    I don't understand I don't propose mixing them: that is standard notation,
    promulgated by Perl.

    For the rest, the "[: ... :]" bracketing is easily perceived everywhere as


    Again, standard notation.

    > 2. multiple values(...)
    > > * \p{gc=L|M|Nd} instead of [\p{gc=L}\p{gc=M}\p{gc=Nd}]
    > Good suggestion but it is quite related to your suggestion 3:
    > > 3. regex values
    > > * propname=/regexForValue/
    > > eg
    > > * \p{name=/MARK/} or equivalently \N{/MARK/}
    > So multiple values would also be encoded using your suggestion 3 as:
    > * \p{gc=/L|M|Nd/}

    yes. However, there is a difference in that multiple values can often be
    more easily computed.

    What do you mean in \p{name=/MARK/} : does this indicate that is will match
    > any character whose property value "equals" the matched regexp, or
    > "contains" the regexp. I would not suggest the "contains" meaning, this is
    > not needed because it should be:
    > * \p{name=/.*MARK.*/}

    As I wrote, that's an open issue.

    But then, why are the slashes needed? If you look at suggestion 2, the
    > leading and trailing slash is not used, but the multiple values are also
    > encoded as a regexp. So your suggestion 3 (regexp values) could as well be
    > supported using the notation in suggestion 2:
    > * \p{name=MARK} or equivalently \N{MARK}

    No. You can't then distinguish an exact match to "MARK".

    If you need to encode the "constains" relation rather than the "equals"
    > relation, I think this relation should be encoded explicitly:
    > * \p{name=.*MARK.*} or equivalently \N{.*MARK.*}
    > At least like this, this does not change the reading of the "=" operator
    > as
    > "equals" in the notation, which can then be replaced where needed by a
    > "different" operator or negated assertion containing the "=" operator
    > (related to "does not contain" if there's a regexp in the value starting
    > and
    > finishing by ".*")

    Trying to parse your language, what I read you as saying is that a
    different equivalence operator could be used instead of the slashes, like

    instead of


    This archive was generated by hypermail 2.1.5 : Tue Oct 02 2007 - 11:26:45 CST