Re: Proposed Draft UTR #31 - Syntax Characters

From: Ben Dougall (bend@freenet.co.uk)
Date: Thu Aug 21 2003 - 13:28:46 EDT

  • Next message: Mark Davis: "Re: [bidi] Re: Unicode Collation Algorithm: 4.0 Update (beta)"

    i'd say wide. narrow means not incorporating some characters that would
    naturally fit into 'white space'. if i was parsing some text i'd
    consider a non-breaking space white space and i'd expect my code to
    reflect that. why would you not want your code to treat a non-breaking
    space or mathematical space not as white space?

    On Thursday, August 21, 2003, at 04:44 pm, Mark Davis wrote:

    > There is one open issue I'd like to draw people's attention to:
    > whether to have
    > a narrow or broader approach to the whitespace in a pattern
    > environment. The
    > narrower definition would be:
    >
    > 0009..000D ; Pattern_White_Space # <CHARACTER TABULATION>..<CARRIAGE
    > RETURN
    > (CR)>
    > 0020 ; Pattern_White_Space # SPACE
    > 0085 ; Pattern_White_Space # <NEXT LINE (NEL)>
    > 200E..200F ; Pattern_White_Space # LEFT-TO-RIGHT MARK..RIGHT-TO-LEFT
    > MARK
    > 2028 ; Pattern_White_Space # LINE SEPARATOR
    > 2029 ; Pattern_White_Space # PARAGRAPH SEPARATOR
    >
    > while the broader one would add:
    >
    > 00A0 ; Pattern_White_Space # NO-BREAK SPACE
    > 2000..200A ; Pattern_White_Space # EN QUAD..HAIR SPACE
    > 202F ; Pattern_White_Space # NARROW NO-BREAK SPACE
    > 205F ; Pattern_White_Space # MEDIUM MATHEMATICAL SPACE
    > 3000 ; Pattern_White_Space # IDEOGRAPHIC SPACE
    >
    > My judgement is that in a pattern environment the narrower devition
    > would be
    > better. One might go so far as recommending that the others be quoted,
    > to reduce
    > possible confusion when reading regular expressions, queries, or other
    > patterns.
    >
    > Mark
    > __________________________________
    > http://www.macchiato.com
    > ► “Eppur si muove” ◄
    >
    > ----- Original Message -----
    > From: <Jill.Ramonsky@Aculab.com>
    > To: <unicode@unicode.org>
    > Sent: Thursday, August 21, 2003 02:44
    > Subject: RE: Proposed Draft UTR #31 - Syntax Characters
    >
    >
    >>
    >>> This notice is relevant to anyone dealing with programming languages,
    >> query
    >>> specifications, regular expressions, scripting languages, and similar
    >> domains.
    >>
    >> That's me.
    >>
    >> I read the draft, and actually I was very happy with it. No
    >> complaints at
    >> all. I am particularly happy that the mathematical letters and numbers
    >> (1D400-1D7FF) will be permitted in identifiers. This is important
    >> because it
    >> allows mathematical expressions and programming-language expressions
    >> to use
    >> the same symbols (for the first time!). I also noted the comment
    >> about how
    >> specific porgramming languages could, if they wished, ignore <font>
    >> equivalences (and hence ignore the mathematical letters and numbers)
    >> - so I
    >> guess that keeps everyone happy.
    >>
    >> I would have used the feedback form, but I didn't see much point as I
    >> had no
    >> complaints.
    >> Jill
    >>
    >>
    >>
    >> -----Original Message-----
    >> From: Rick McGowan [mailto:rick@unicode.org]
    >> Sent: Wednesday, August 20, 2003 7:23 PM
    >> To: unicode@unicode.org
    >> Subject: Proposed Draft UTR #31 - Syntax Characters
    >>
    >>
    >> This notice is relevant to anyone dealing with programming languages,
    >> query
    >> specifications, regular expressions, scripting languages, and similar
    >> domains.
    >>
    >> The Proposed Draft UTR #31: Identifier and Pattern Syntax will be
    >> discussed
    >> at
    >> the UTC meeting next week. Part of that document (Section 4) is a
    >> proposal
    >> for
    >> two new immutable properties, Pattern_White_Space and Pattern_Syntax.
    >> As
    >> immutable properties, these would not ever change once they are
    >> introduced
    >> into
    >> the standard, so it is important to get feedback on their contents
    >> beforehand.
    >>
    >> The UTC will not be making a final determination on these properties
    >> at this
    >> meeting, but it is important that any feedback on them is supplied as
    >> early
    >> in
    >> the process as possible so that it can be considered thoroughly. The
    >> draft
    >> is
    >> found at http://www.unicode.org/reports/tr31/ and feedback can be
    >> submitted
    >> as
    >> described there.
    >>
    >> Regards,
    >> Rick McGowan
    >> Unicode, Inc.
    >>
    >>
    >
    >



    This archive was generated by hypermail 2.1.5 : Thu Aug 21 2003 - 14:26:58 EDT