Re

L2/01-454R2

Re:	XML identifier names
From:	Mark Davis
Date:	2001-11-08

The UTC understands the need for stability in identifier names in XML = and similar "meta-language" protocols, and is comfortable with the approach of reserving a small number of ranges of characters for non-identifiers, and letting all other characters be accepted in = identifiers. Mechanisms such as Schema and DTDs can be used to limit the characters = that are actually used.

We recommend that the following be excluded from such = identifiers:

Control characters
Noncharacters
Default-Ignorable ranges:

2060..206F
FFF0..FFFB
E0000..E0FFF

Excluding, however, the variation selectors = E0110..E01FF

If a range of character is to be excluded from identifiers to allow = the use of future syntax characters, we recommend that those ranges be:

0000..00FF*
2000..203E
2041..205F
2190..27FF
2900..2BFF

* A subset of the characters in #1 should be treated on a = case-by-case basis for backwards compatibility with current identifiers.

The goal is that all valid Unicode identifiers remain valid XML identifiers.

The UTC will:

A. Request that WG2 add to the principles and procedures that:

no characters suitable for use in identifiers (e.g. letters and = numbers) will be encoded in the ranges #4 and #5 above.
characters suitable for syntax characters should, where possible, = be encoded in ranges #4 and #5.

B. Add to Unicode Policies that

(a) There will be no more non-characters, so that the ranges above = are stable.

(b) Default ignorable characters will not be assigned outside of the Default_Ignoreable_Code_Point ranges, nor will non-default ignorable = characters be assigned inside those ranges.

C. Add a subsection to the standard on identifiers that describes the = use of stable identifiers for meta-languages, and the tradeoffs involved in = using them.