XML identifier names


Mark Davis



The UTC understands the need for stability in identifier names in XML = and similar "meta-language" protocols, and is comfortable with the approach of reserving a small number of ranges of characters for non-identifiers, and letting all other characters be accepted in = identifiers. Mechanisms such as Schema and DTDs can be used to limit the characters = that are actually used.

We recommend that the following be excluded from such = identifiers:

If a range of character is to be excluded from identifiers to allow = the use of future syntax characters, we recommend that those ranges be:

  1. 0000..00FF*
  2. 2000..203E
  3. 2041..205F
  4. 2190..27FF
  5. 2900..2BFF

* A subset of the characters in #1 should be treated on a = case-by-case basis for backwards compatibility with current identifiers.

The goal is that all valid Unicode identifiers remain valid XML identifiers.

The UTC will:

A. Request that WG2 add to the principles and procedures that:

B. Add to Unicode Policies that

(a) There will be no more non-characters, so that the ranges above = are stable.

(b) Default ignorable characters will not be assigned outside of the Default_Ignoreable_Code_Point ranges, nor will non-default ignorable = characters be assigned inside those ranges.

C. Add a subsection to the standard on identifiers that describes the = use of stable identifiers for meta-languages, and the tradeoffs involved in = using them.