L2/05-015 Date: Thu, 20 Jan 2005 10:52:05 -0800 From: Markus Scherer Subject: Property[Value]Aliases.txt changes for compatibility Unicode 4.1 beta Property[Value]Aliases.txt Proposed Changes 1. Script code for Coptic The script code for Coptic was documented as Qaac (although Qaac was not used in the UCD, I believe) before the formal assignment of Copt. Both should be recognized. Change in PropertyValueAliases.txt sc ; Copt ; Coptic to sc ; Copt ; Coptic ; Qaac (The file already contains additional aliases for backward compatibility, for other property values.) 2. Aliases for POSIX-style character classes Some of the POSIX-style character classes directly correspond to Unicode properties, or values of Unicode properties, according to the Standard Recommendations in UTS #18 (regex) Annex C: Compatibility Properties. A regular expression engine using Unicode property [value] aliases automatically gets support for the alpha, upper, and lower POSIX-style character classes. There was a deliberate change of property assignments in Unicode 4.1 to make the Alphabetic property a superset of both Lowercase and Uppercase, so that these would be consistent. Additional aliases should be added to support further POSIX-style character classes automatically, where there is a direct correspondence. Change in PropertyValueAliases.txt gc ; P ; Punctuation # Pc | Pd | Pe | Pf | Pi | Po | Ps to gc ; P ; Punctuation ; punct # Pc | Pd | Pe | Pf | Pi | Po | Ps and gc ; Nd ; Decimal_Number to gc ; Nd ; Decimal_Number ; digit and gc ; Cc ; Control to gc ; Cc ; Control ; cntrl Note that these aliases will be recognized in \p{punct} or [:punct:] because the property name is optional for General_Category. Change in PropertyAliases.txt WSpace ; White_Space to WSpace ; White_Space ; space markus .