Re: Proposed Draft UTR #31 - Syntax Characters

From: Peter Kirk (
Date: Mon Aug 25 2003 - 10:13:54 EDT

  • Next message: Jungshik Shin: "UTS #10 : comment on Hangul Jamo(Letter) collation"

    On 25/08/2003 05:16, Marco Cimarosti wrote:

    >Peter Kirk wrote:
    >>But the other way round is less of a problem. So I am suggesting that
    >>for now we define all punctuation characters except for those with
    >>specifically defined operator functions, also all undefined
    >>characters, as giving a syntax error. This makes it possible
    >>to define additional punctuation characters, even those in so far
    >>undefined scripts like Tifinagh, as valid operators in future
    >Yes, but this makes it impossible to use any as-yet undefined scripts in
    >identifiers! E.g., you'd never be able to write a variable name in Tifinagh
    >letters in future versions!
    >Unless you are still thinking at non-fixed sets, in which case I must remind
    >you again that there are no balls or door-keepers in a card game... :-)
    Well, I was thinking that as soon as a new character is defined which is
    not punctuation, it is automatically not an identifier. Obviously there
    are a few problems of detail there, but only of the types which have to
    be faced by any program which has to deal with as yet undefined
    characters. I suppose we just have to say that behaviour with undefined
    characters is undefined! - as we don't yet know if they are punctuation
    or not.

    I note the following from the TR31 draft:

    > For stability, the property values will be absolutely invariant; not
    > changing with successive versions of Unicode. Of course, this doesn't
    > limit the ability of the Unicode Standard to add more symbol or
    > whitespace characters, but the syntax and whitespace characters
    > recommended for use in patterns would not change.

    I am not sure what is meant here by "symbol ... characters", not
    otherwise defined in this draft. Maybe this is an error for "syntax ...
    characters" as later in the sentence. Is the meaning of this "absolute
    invariant" that once a character is defined as a syntax character, or as
    a whitespace character, it will always remain one, but that additional
    characters may be defined as syntax characters, or as whitespace
    characters, in later versions? If so, we don't have a problem, as we can
    add Tifinagh punctuation, and Arabic and Hebrew punctuation, in later
    versions as required. My problem is if the list is to be understood as
    complete now for all time. I would see this as both unnecessary and
    problematic. The way round this is to define syntax relative to a
    specific version of Unicode.

    On the point that having a small fixed list saves storage space, I can
    see that it might do in the short term but also that in the medium term
    it will increase complexity as so many workarounds are necessary - just
    as with incorrectly fixed combining classes in Hebrew etc etc.

    As for goalkeepers (not door-keepers), I don't see your point. You can
    accuse me of trying to change soccer into American football or vice
    versa if you like, to which my reply is that the current rules are only
    a proposed draft, and my rules (like soccer!) are more globally
    acceptable, even for Tifinagh users. But I am talking about the same
    general kind of ball game, with some adjustments.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Aug 25 2003 - 11:06:30 EDT