There is an identifier space and an operator/syntactic element space. The
question is not if happy face should be used in identifiers, but which is
more important, being able to use happy face as an operator or in an
identifier. In general, it is easier to take something that is originally
not defined as available for identifiers and change the rules to allow it
as an identifier, than to take something that was once allowed in an
identifier make it no longer valid. Therefore, my tendency would be
against allowing things in identifiers without a sound principle (e.g.
those expressed in 5.14 of Unicode 2.0).
On Wed, 1 Mar 2000 email@example.com wrote:
> I got my Unicode 3.0 book this morning (thank you Amazon)... but it's at
> home so I can't refer to it.
> The only things off-the-top-of-my-head that I can think of here is that
> we might want to prevent certain "equivalent" characters or compatibility
> characters from being used in identifiers. In other words, if you pass
> the code text through a normalization the identifiers should all be
> The point here is that the use of combining characters versus precomposed
> characters should not result in *separate* identifiers: if it looks the
> same on the screen it should be the same to the compiler. This implies
> normalizing the text as a precondition to lexing and depending on which
> normalization form you choose the punctuation and other characters could
> be normalized into illegal sequences... so not everything above U+00A0 is
> Addison P. Phillips
> Senior Globalization Consultant
> Global Sight Corporation
> 101 Metro Drive, Suite 750
> San Jose, California 95110 USA
> (+1) 408.350.3649 - Phone
> Going global with your web site? Global Sight provides Web-based
> software solutions that simplify the process, cut costs, and save time.
> Sent by: John Cowan <firstname.lastname@example.org>
> 03/01/2000 10:46 AM
> To: "Unicode List" <email@example.com>
> Subject: Rationale wanted for Unicode identifier rules
> (Still waiting for my bookstore to get 3.0 book.)
> Section 5.14 of 2.0 says:
> # The formal syntax provided here is intended to capture the general
> # intent that an identifier consists of a string of characters that
> # with a letter or an ideograph, and then follows with any number of
> # ideographs, digits, or underscores.
> Can anyone give me a rationale for rejecting the following argument:
> > There are some [syntax] characters we know we need to prohibit [in
> > identifiers, such as +, -, etc.], as well as a couple of ranges of
> > control characters, but other than that I'm not sure why it's worth
> > bothering.
> > [...] I don't see the need for prohibiting every possible
> > punctuation character or characters such as a smiley or a snow man,
> > even though I would probably not use them in an [identifier] myself. As
> > long as they don't conflict with the [rest of the] syntax, it makes no
> > difference [to the] parser.
> In other words, programming languages have historically tended to allow
> anything in an identifier that wasn't used for some syntactic purpose;
> leading digits were forbidden to make lexers simpler. What specific
> reason is there not to treat all hitherto-unknown Unicode characters
> as legitimate in identifiers, in the manner of the Plan9 C compiler
> (which extends C to treat everything from U+00A0 on up as valid)?
> I need this to help me write a draft standard, so I'm not asking out
> of randomness.
> Schlingt dreifach einen Kreis vom dies! || John Cowan
> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
> Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan
> Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT