From: Mark Davis
Date: Sat, Apr 12, 2008
Subject: Being aware of Pattern_Syntax when allocating characters

We need to remind WG2 (and ourselves) that any characters that are put into the code point ranges covered by Pattern_Syntax are irrevocably forbidden from being Unicode identifiers according to our stability policies. This should also be incorporated into the policies and procedures, so that no character that might be reasonable in identifiers is allocated in those ranges by future amendments of ISO 10646 or versions of Unicode. In particular, this includes characters that are (or might be changed in the future to be): Letters, Marks, Numbers, and Connector-Punctuation. (More precisely, in UnicodeSet syntax this is [[:L:][:Mn:][:Mc:][:Nl:][:Nd:][:Pc:]]. See http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax). So here's the proposal.

Proposal: Add to the policies and procedures the following text:

Characters that are Letters, Marks, Numbers, or Connector-Punctuation characters (LMNC) must not be allocated in the following blocks. Moreover, any characters that have a significant chance of being recategorized in the future to be LMNC characters must also not be allocated in those blocks. This is to allow those characters to be used as programmatic identifiers, which have certain stability constraints: see http://www.unicode.org/reports/tr31/#Default_Identifier_Syntax.

  • Miscellaneous_Technical
  • Control_Pictures
  • Optical_Character_Recognition
  • Miscellaneous_Symbols
  • Dingbats
  • Miscellaneous_Mathematical_Symbols_A
  • Miscellaneous_Symbols_And_Arrows
  • Supplemental_Punctuation

In Unicode 5.1, the unassigned characters in those blocks have the following ranges.

23E8..23FF     # Miscellaneous_Technical                  [24]
2427..243F     # Control_Pictures                         [25]
244B..245F     # Optical_Character_Recognition            [21]
269E..269F     # Miscellaneous_Symbols                     [2]
26BD..26BF     # Miscellaneous_Symbols                     [3]
26C4..26FF     # Miscellaneous_Symbols                    [60]
2700           # Dingbats
2705           # Dingbats
270A..270B     # Dingbats                                  [2]
2728           # Dingbats
274C           # Dingbats
274E           # Dingbats
2753..2755     # Dingbats                                  [3]
2757           # Dingbats
275F..2760     # Dingbats                                  [2]
2795..2797     # Dingbats                                  [3]
27B0           # Dingbats
27BF           # Dingbats
27CB           # Miscellaneous_Mathematical_Symbols_A
27CD..27CF     # Miscellaneous_Mathematical_Symbols_A      [3]
2B4D..2B4F     # Miscellaneous_Symbols_And_Arrows          [3]
2B55..2BFF     # Miscellaneous_Symbols_And_Arrows        [171]
2E31..2E7F     # Supplemental_Punctuation                 [79]