Re: Rationale wanted for Unicode identifier rules

From: Tex Texin (
Date: Sun Mar 05 2000 - 01:45:25 EST

It is a good point.
In thinking about schemes that provide version compatibility, we should
perhaps be more specific about the behaviors we want to preserve.

So, if I use a (new) character in an identifier name, we should
ask what should happen in a version that doesn't know the character
and in one that does, with respect to various operations.

I think a nice property of an escape mechanism could be that
regardless of the version, escaped characters could follow
their own rules.

So properties relevant to identifiers that need to be considered:

So if a character is defined more recently than XML version x,
XML can either: reject it, use some presumed default behaviors, know
where to
look up the behaviors, or ?...

If the character is escaped, XML could stipulate that the character
will not be folded, normalized, and will be sorted based on its binary
code point, etc. And if a later version of XML supports the character
and knows its properties, it should still treat it as escaped, for
consistency, if the escape character precedes it.

We need to consider some display issues. I presume the escape
character would always be indicated.

What happens if the escaped character is a combining character?
I dunno, software today needs to know so many things about a character,
and for interoperability each software package needs to be sure to do
the same thing
with each character.
I think it might be better for the software packages to have a protocol
to negotiate between them to the greatest XML version that is common
between them.
If a new character is really required, then all software has to be
upgraded to
the supporting XML version, or maybe newer software uses fallback
when interoperating with software that is of a lower version....

maybe this'll seem clearer in the morning.

Paul Dempsey wrote:
> re. XML identifiers
> An interesting document I've seen recently proposes an encoding scheme to
> represent Unicode characters that are otherwise not allowed in XML 1.0
> identifiers. I can't share the specific proposal, but I expect the principle
> should be well understood by the members of this list. It uses '_' as an
> escape character to signal a notation for an arbitrary Unicode character
> (some simple additional rules are applied to decide if you have an encoded
> non-XML1.0 character). It's similar to other schemes to extend the range of
> a representation required by a given process (such as DNS names).
> Using such a scheme allows you to present names to the user using the full
> range of Unicode (any version) and keep complete compatability with XML 1.0.
> These simple escape/encoding schemes seem to be a common technique for
> extending standards/syntaxes. I applied this myself in a tool for encoding
> WinHelp topic identifiers.
> Perhaps authors of new standards can anticipate this need and define the
> extension mechanism as part of the initial standard, instead of having the
> extension retrofitted.
> Your thoughts?
> Regards,
> --- Paul Chase Dempsey
> Microsoft Visual Studio Text Editor Development

Progress is a proud sponsor of the 16th International Unicode Conference
March 27-30, 2000 in Amsterdam, Holland
See our panel on Open Source Approaches to Unicode Libraries
Tex Texin                     Director, International Products
Progress Software Corp.       +1-781-280-4271
14 Oak Park                   +1-781-280-4655 (Fax)
Bedford, MA 01730  USA The #1 Embedded Database JMS Compliant Messaging- Best Middleware Award Leading provider in the ASP marketplace

Progress Globalization Program ------------------------------------------------------------------------------------------------ Spanish Proverb: Don't speak unless you can improve on the silence. Tex's Proverb: Don't email unless you can improve on the screen saver.

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:59 EDT