Re: First draft of proposed XML TC for Unicode 3.0 (unofficial)

From: John Cowan (cowan@locke.ccil.org)
Date: Thu Sep 09 1999 - 18:21:13 EDT


I wrote:

> In addition, the following characters no longer pass the tests given
> in Appendix B for valid name or name-start characters, but should
> remain legal in XML names for backward compatibility, and therefore
> should be explicitly enumerated in the corrigendum:

An explanation of these seems to be in order.

> 03D0;GREEK BETA SYMBOL
> 03D1;GREEK THETA SYMBOL
> 03D2;GREEK UPSILON WITH HOOK SYMBOL
> 03D5;GREEK PHI SYMBOL
> 03D6;GREEK PI SYMBOL
> 03F0;GREEK KAPPA SYMBOL
> 03F1;GREEK RHO SYMBOL
> 03F2;GREEK LUNATE SIGMA SYMBOL
> 0675;ARABIC LETTER HIGH HAMZA ALEF
> 0676;ARABIC LETTER HIGH HAMZA WAW
> 0677;ARABIC LETTER U WITH HAMZA ABOVE
> 0678;ARABIC LETTER HIGH HAMZA YEH
> 0E33;THAI CHARACTER SARA AM
> 0EB3;LAO VOWEL SIGN AM
> 0F77;TIBETAN VOWEL SIGN VOCALIC RR
> 0F79;TIBETAN VOWEL SIGN VOCALIC LL
> 1E9A;LATIN SMALL LETTER A WITH RIGHT HALF RING

All of the above are now excluded by Appendix B rules because they
have been given compatibility decompositions. For example, BETA SYMBOL
is now considered a compatibility equivalent of SMALL LETTER BETA.
In the cases of THAI CHARACTER SARA AM and LAO VOWEL SIGN AM, the
previous canonical decompositions were changed to compatibility
decompositions. The other characters formerly had no decompositions at all.
Any character with a compatibility decomposition is excluded by the
fourth rule in Appendix B.

Note that all these characters are still considered letters; the
appearance of SYMBOL in some of the names is not an indication that
they are symbols.

> 212E;ESTIMATED SYMBOL

This character was erroneously classified as a letter and is now
rightly classified as a symbol.

-- 
John Cowan                                   cowan@ccil.org
       I am a member of a civilization. --David Brin



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT