ID_Start, ID_Continue, and stability extensions

From: Mathias Bynens <mathias_at_qiwi.be>
Date: Wed, 23 Apr 2014 19:18:48 +0200

http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax defines ID_Start as:

> Characters having the Unicode General_Category of uppercase letters (Lu), lowercase letters (Ll), titlecase letters (Lt), modifier letters (Lm), other letters (Lo), letter numbers (Nl), minus Pattern_Syntax and Pattern_White_Space code points, plus stability extensions. Note that “other letters” includes ideographs.

What are the “stability extensions” this document refers to?

I noticed that parsing `DerivedCoreProperties.txt` for `ID_Start` leads to slightly different results, than parsing `UnicodeData.txt` for category names and then adding the categories together, minus `Pattern_Syntax` and `Pattern_White_Space` which you can get by parsing `PropList.txt`.

For example, U+2118 SCRIPT CAPITAL P is included in `ID_Start` as per `DerivedCoreProperties.txt`, but it doesn’t match any of the above categories. Is this an example of such a “stability extension”, or was this an oversight?

Regards,
Mathias
_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Apr 23 2014 - 12:20:15 CDT

This archive was generated by hypermail 2.2.0 : Wed Apr 23 2014 - 12:20:16 CDT