    On 22/08/2003 08:31, Mark Davis wrote:

    >The purpose of the Pattern Syntax characters is *not* to list everything that is
    >a symbol or punctuation mark. That exists independently. Think of them as
    >operators in the engine syntax, as "?" or "*" are used today in Perl, or as
    >+, -, /, * could be used in math expressions.
    >The goal is to have a relatively small, unchangeable list of ranges, which
    >contain a reasonable restriction on characters for future syntax characters in a
    >general pattern environment. General regular expression engines, for example,
    >would *not* add 05C3 HEBREW PUNCTUATION SOF PASUQ as an operator, to indicate
    >(say) a non-greedy match variant of *.
    Maybe I misunderstood what Marco was talking about. No, I would not
    expect a separate SOF PASUQ operator. My point was more that a Hebrew
    user might acccidentally type or prefer to type SOF PASUQ instead of a
    colon etc.

    I don't think we should be defining as an "unchangeable list" only Latin
    characters for the syntax, thus tying computer languages inseparably to
    the Latin alphabet. That would give the Africans some good reasons to
    complain that Unicode is too American and/or European. It's the
    "unchangeable" which makes me very nervous here. For now all computer
    languages are Latin alphabet based, as far as I know, but who knows what
    will happen in 50-100 years? Computer languages based on Devanagari,
    Arabic or Japanese scripts would be a real possibility (or Cyrillic, but
    the punctuation is the same as Latin).

    Now it would be a different matter if we could somehow reserve other
    punctuation characters for further extension. Then we could allow Latin
    punctuation to be used as operators but require that all other
    punctuation be quoted.

