Re: Proposed Draft UTR #31 - Syntax Characters

From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Aug 22 2003 - 11:31:40 EDT

  • Next message: Mark Davis: "Re: Proposed Draft UTR #31 - Syntax Characters"

    The purpose of the Pattern Syntax characters is *not* to list everything that is
    a symbol or punctuation mark. That exists independently. Think of them as
    operators in the engine syntax, as "?" or "*" are used today in Perl, or as
    +, -, /, * could be used in math expressions.

    The goal is to have a relatively small, unchangeable list of ranges, which
    contain a reasonable restriction on characters for future syntax characters in a
    general pattern environment. General regular expression engines, for example,
    would *not* add 05C3 HEBREW PUNCTUATION SOF PASUQ as an operator, to indicate
    (say) a non-greedy match variant of *.

    Mark
    __________________________________
    http://www.macchiato.com
    ► “Eppur si muove” ◄

    ----- Original Message -----
    From: "Peter Kirk" <peterkirk@qaya.org>
    To: "Marco Cimarosti" <marco.cimarosti@essetre.it>
    Cc: <unicode@unicode.org>
    Sent: Friday, August 22, 2003 07:45
    Subject: Re: Proposed Draft UTR #31 - Syntax Characters

    > On 22/08/2003 06:04, Marco Cimarosti wrote:
    >
    > >Rick McGowan wrote:
    > >
    > >
    > >>the process as possible so that it can be considered
    > >>The draft is found at http://www.unicode.org/reports/tr31/
    > >>and feedback can be submitted as described there.
    > >>
    > >>
    > >
    > >(Before submitting official feedback, I'd like to discuss my comments here.
    > >BTW, which "Type of Message" should I use in the feedback form? Is it OK to
    > >use "Technical Report or Tech Note issues"?)
    > >
    > >
    > >My two cents are both about adding characters in the <Pattern_Syntax> of
    > >"4.1 Proposed Pattern Properties".
    > >
    > >IMHO:
    > >
    > > 1. Full-width, half-width, and "small" punctuation characters should
    > >in class <Pattern_Syntax> as their "normal width" counterparts.
    > >
    > > 2. Non-Latin punctuation character should be in class
    > ><Pattern_Syntax> as their Latin counterparts.
    > >
    > >...
    > >
    > >Should any of the above character be added to <Pattern_Syntax> (i.e. *not*
    > >allowed in identifiers)?
    > >
    > >_ Marco
    > >
    > >
    > >
    > >
    > >
    > >
    > We should include
    >
    > 05C3 HEBREW PUNCTUATION SOF PASUQ
    >
    > as this is similar in appearance, at least in many Hebrew fonts, and
    > function to a colon. Also if the ordinary Latin hyphen, quotation mark,
    > vertical line etc are included, so should be
    >
    > 05BE HEBREW PUNCTUATION MAQAF
    > 05C0 HEBREW PUNCTUATION PASEQ
    > 05F3 HEBREW PUNCTUATION GERESH
    > 05F4 HEBREW PUNCTUATION GERSHAYIM
    >
    > and equivalents in Armenian, Syriac etc etc. Indeed why not include
    > everything with punctuation properties? According to tr31 "some
    > script-specific characters were removed". Why? What remains is also
    > script-specific, but just for Latin script.
    >
    > --
    > Peter Kirk
    > peter@qaya.org (personal)
    > peterkirk@qaya.org (work)
    > http://www.qaya.org/
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Fri Aug 22 2003 - 12:22:49 EDT