RE: Proposed Draft UTR #31 - Syntax Characters

From: Marco Cimarosti (
Date: Fri Aug 22 2003 - 14:29:04 EDT

  • Next message: Peter Kirk: "Re: Proposed Draft UTR #31 - Syntax Characters"

    Mark Davis wrote:
    > Technical Report issues would be fine.
    > I think #1 is worth considering. For #2, see other message to
    > Peter Kirk.

    I agree with your statement: "The purpose of the Pattern Syntax characters
    is *not* to list everything that is a symbol or punctuation mark". But that
    is what Kirk suggested, not what I proposed.

    I proposed to exclude a *limited* set of script-specific punctuation that
    *might* be confused with punctuation characters normally used in the syntax
    of computer languages, either because they look identical, or because they
    are perceived culturally as "another form of the same character".

    E.g., I kept out from the list everything belonging to ancient scripts
    (who's going to write programs in Linear B!?) and anything that I suspected
    would be valid inside a word or expression: hyphens, emphasis markers,
    ellipsis marks, etc.

    You said that the list of ranges must be invariable, but I doubt that we
    will many new *modern* and *commonly* used punctuation marks in future
    versions, so think that this requirement for invariability can reasonably be

    I already made the example of the Greek question mark which may be mistaken
    for a semicolon. That is *not* an unlikely situation: if a Greek programmer
    has his keyboard in Greek mode (because he just finished typing an
    identifier containing Greek letters) he may well forget to turn it to Latin
    mode before typing the trailing semicolon.

    Similarly, due to the fact that some punctuation characters (parentheses,
    etc.) are mirrored in a RTL context, an Arab programmer may think that "؟"
    is just an alternate RTL glyph for "?", so he may be puzzled by apparently
    absurd error messages.

    E.g., he types:


    And the system calls routine "foo" passing variable "bar" to it. ("?" is the
    "call" operator of this hypothetical programming language).

    So, he switches to Arabic mode and types:


    But the system says: "Undeclared identifier". But he is *sure* that he did
    declare a routine named "فو", and a variable named "بار", so what's going
    on? If the system said instead: "Character '؟' is not a legal operator",
    everything would be much clearer.

    _ Marco

    This archive was generated by hypermail 2.1.5 : Fri Aug 22 2003 - 15:24:19 EDT