Re: Rationale wanted for Unicode identifier rules

From: Kenneth Whistler (kenw@sybase.com)
Date: Wed Mar 01 2000 - 16:41:43 EST


John Cowan asked:

>
> Kenneth Whistler wrote:
>
> > A. Identifier syntax along the lines described in Unicode 3.0.
>
> Can you (or someone) supply a precis of this to the poor fellow
> who still hasn't heard from his bookstore's order department?
> Especially if it is indeed simpler than the Unicode 2.0 version?

Sure. For those of you who already have the hymnal, turn to page 134 to
sing along.

<identifier> ::= <identifier_start> (<identifier_start> | <identifier_extend>)*

<identifier_start> is defined by an equivalent category set consisting of
       all those characters with the General Category values:
       Lu, Ll, Lt, Lm, Lo, Nl

<identifier_extend> is defined by an equivalent category set consisting of
       all those characters with the General Category values:
       Mn, Mc, Nd, Pc, Cf

Thus, identifiers can start with any "letter" or "letter number".

Identifiers can continue with any "letter" or "letter number", any combining
mark (except the symbolic surrounds), any decimal digit, any connecting
punctuation, or any format control character (e.g. the invisible bidi
layout controls, ZWJ, ZWNJ, etc.).

Note that this definition explicitly excludes the following General Category
values from identifiers:

   Me, No, Zs, Zl, Zp, Cc, Pd, Ps, Pe, Pi, Pf, Po, Sm, Sc, Sk, So

i.e. enclosing combining marks, "other numerals", all spaces, control
characters, all other punctuation, and all "symbols".

--Ken



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT