RE: Rationale wanted for Unicode identifier rules

From: Paul Dempsey (
Date: Sun Mar 05 2000 - 00:39:55 EST

re. XML identifiers

An interesting document I've seen recently proposes an encoding scheme to
represent Unicode characters that are otherwise not allowed in XML 1.0
identifiers. I can't share the specific proposal, but I expect the principle
should be well understood by the members of this list. It uses '_' as an
escape character to signal a notation for an arbitrary Unicode character
(some simple additional rules are applied to decide if you have an encoded
non-XML1.0 character). It's similar to other schemes to extend the range of
a representation required by a given process (such as DNS names).

Using such a scheme allows you to present names to the user using the full
range of Unicode (any version) and keep complete compatability with XML 1.0.

These simple escape/encoding schemes seem to be a common technique for
extending standards/syntaxes. I applied this myself in a tool for encoding
WinHelp topic identifiers.

Perhaps authors of new standards can anticipate this need and define the
extension mechanism as part of the initial standard, instead of having the
extension retrofitted.

Your thoughts?

--- Paul Chase Dempsey
Microsoft Visual Studio Text Editor Development

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT