From: Peter Kirk (firstname.lastname@example.org)
Date: Wed Aug 20 2003 - 19:21:07 EDT
On 20/08/2003 11:23, Rick McGowan wrote:
>This notice is relevant to anyone dealing with programming languages, query
>specifications, regular expressions, scripting languages, and similar domains.
>The Proposed Draft UTR #31: Identifier and Pattern Syntax will be discussed at
>the UTC meeting next week. Part of that document (Section 4) is a proposal for
>two new immutable properties, Pattern_White_Space and Pattern_Syntax. As
>immutable properties, these would not ever change once they are introduced into
>the standard, so it is important to get feedback on their contents beforehand.
>The UTC will not be making a final determination on these properties at this
>meeting, but it is important that any feedback on them is supplied as early in
>the process as possible so that it can be considered thoroughly. The draft is
>found at http://www.unicode.org/reports/tr31/ and feedback can be submitted as
> Rick McGowan
> Unicode, Inc.
I'm a little concerned at the implications of counting zero width
characters like LRM and RLM as white space. They can easily find their
way unnoticed into the middle of patterns e.g. when copying from a text
which has added these characters to ensure correct directionality. I
wonder if it might be better to add a new category of ignored
characters, such that one of these found on its own doesn't count as a
separator but it is ignored i.e. treated as part of the white space if
found adjacent to white space. Of course the details of this need a
little more thought, e.g. does one of these actually count as part of
the pattern, but I hope you see what I am getting at.
-- Peter Kirk email@example.com (personal) firstname.lastname@example.org (work) http://www.qaya.org/
This archive was generated by hypermail 2.1.5 : Wed Aug 20 2003 - 20:02:31 EDT