(Still waiting for my bookstore to get 3.0 book.)
Section 5.14 of 2.0 says:
# The formal syntax provided here is intended to capture the general
# intent that an identifier consists of a string of characters that starts
# with a letter or an ideograph, and then follows with any number of letters,
# ideographs, digits, or underscores.
Can anyone give me a rationale for rejecting the following argument:
> There are some [syntax] characters we know we need to prohibit [in
> identifiers, such as +, -, etc.], as well as a couple of ranges of
> control characters, but other than that I'm not sure why it's worth
> [...] I don't see the need for prohibiting every possible
> punctuation character or characters such as a smiley or a snow man,
> even though I would probably not use them in an [identifier] myself. As
> long as they don't conflict with the [rest of the] syntax, it makes no
> difference [to the] parser.
In other words, programming languages have historically tended to allow
anything in an identifier that wasn't used for some syntactic purpose;
leading digits were forbidden to make lexers simpler. What specific
reason is there not to treat all hitherto-unknown Unicode characters
as legitimate in identifiers, in the manner of the Plan9 C compiler
(which extends C to treat everything from U+00A0 on up as valid)?
I need this to help me write a draft standard, so I'm not asking out
Schlingt dreifach einen Kreis vom dies! || John Cowan <email@example.com> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT