From: Mark Davis (mark.davis@jtcsv.com)
Date: Fri Aug 22 2003 - 11:31:40 EDT
The purpose of the Pattern Syntax characters is *not* to list everything that is
a symbol or punctuation mark. That exists independently. Think of them as
operators in the engine syntax, as "?" or "*" are used today in Perl, or as
+, -, /, * could be used in math expressions.
The goal is to have a relatively small, unchangeable list of ranges, which
contain a reasonable restriction on characters for future syntax characters in a
general pattern environment. General regular expression engines, for example,
would *not* add 05C3 HEBREW PUNCTUATION SOF PASUQ as an operator, to indicate
(say) a non-greedy match variant of *.
Mark
__________________________________
http://www.macchiato.com
► “Eppur si muove” ◄
----- Original Message -----
From: "Peter Kirk" <peterkirk@qaya.org>
To: "Marco Cimarosti" <marco.cimarosti@essetre.it>
Cc: <unicode@unicode.org>
Sent: Friday, August 22, 2003 07:45
Subject: Re: Proposed Draft UTR #31 - Syntax Characters
> On 22/08/2003 06:04, Marco Cimarosti wrote:
>
> >Rick McGowan wrote:
> >
> >
> >>the process as possible so that it can be considered
> >>The draft is found at http://www.unicode.org/reports/tr31/
> >>and feedback can be submitted as described there.
> >>
> >>
> >
> >(Before submitting official feedback, I'd like to discuss my comments here.
> >BTW, which "Type of Message" should I use in the feedback form? Is it OK to
> >use "Technical Report or Tech Note issues"?)
> >
> >
> >My two cents are both about adding characters in the <Pattern_Syntax> of
> >"4.1 Proposed Pattern Properties".
> >
> >IMHO:
> >
> > 1. Full-width, half-width, and "small" punctuation characters should
> >in class <Pattern_Syntax> as their "normal width" counterparts.
> >
> > 2. Non-Latin punctuation character should be in class
> ><Pattern_Syntax> as their Latin counterparts.
> >
> >...
> >
> >Should any of the above character be added to <Pattern_Syntax> (i.e. *not*
> >allowed in identifiers)?
> >
> >_ Marco
> >
> >
> >
> >
> >
> >
> We should include
>
> 05C3 HEBREW PUNCTUATION SOF PASUQ
>
> as this is similar in appearance, at least in many Hebrew fonts, and
> function to a colon. Also if the ordinary Latin hyphen, quotation mark,
> vertical line etc are included, so should be
>
> 05BE HEBREW PUNCTUATION MAQAF
> 05C0 HEBREW PUNCTUATION PASEQ
> 05F3 HEBREW PUNCTUATION GERESH
> 05F4 HEBREW PUNCTUATION GERSHAYIM
>
> and equivalents in Armenian, Syriac etc etc. Indeed why not include
> everything with punctuation properties? According to tr31 "some
> script-specific characters were removed". Why? What remains is also
> script-specific, but just for Latin script.
>
> --
> Peter Kirk
> peter@qaya.org (personal)
> peterkirk@qaya.org (work)
> http://www.qaya.org/
>
>
>
>
This archive was generated by hypermail 2.1.5 : Fri Aug 22 2003 - 12:22:49 EDT