Public Review Issue #132: Code Point Name/Label Options


After considering the feedback on, the UTC discussed the following options:

Option A. Define a Code Point Label property (as given in pr129.html). This is a derived property based on the existing Name property, plus constructed values for what are null Name property values in Unicode 5.1.


Option B. Define a Code Point Name property. This is a derived property defined in the same way as the Code Point Label in Option A (just a change of property name).


Option C. Don't define a new property, but instead expand the existing Name property to also cover code points that had null values in Unicode 5.1. For more details about what this would look like, see below.


Option D. Status Quo: do not define a new property, do not change the existing Name property.


The concerns around options A or B are that Unicode is already baroque, and the difference between the Name and Code Point Name/Label properties will seem obscure to users, and just cause confusion and errors.


The concerns around option C are that this is a change to an long-existing property, and may cause confusion or difficulties for ISO 10646.


The concerns around option D are the continuing confusion between name values and comments supplied in the Unicode Character Database.



In each of these options, the property values would be the following.

Construction of Names/Labels

Type Value (NNNN represents the code point)
Controls control-NNNN
Reserved reserved-NNNN
Noncharacter noncharacter-NNNN
Private-Use private-use-NNNN
Surrogate surrogate-NNNN
Others Field 1 of UnicodeData or constructed values for Hangul Syllables or CJK Ideographs 

Changes if we do option A or B

The changes for A are given in pr129.html, while the changes for B are a straightforward modification of A.

Changes if we do option C

[[ As a 4th bullet under definition D4 Character Name in Chapter 3, insert ]]

[[Incorporate the following text in Section 4.8, "Name -- Normative", as a subsection, with appropriate editorial adjustments to other existing text in that section. ]]

Unicode Code Point Name


The Name property (short alias: "na") is a string property, defined as follows:


When displayed in mixed contexts, to emphasize the distinction between graphic/format code point names and others, the others are often displayed between angle brackets: <control-0009>, <noncharacter-FFFF>, etc.


Note that the Name property values are unique for all code points. Furthermore, the Name property value uniqueness requirement interacts with name assignment rules for formal aliases and for named character sequences: Unicode character names, formal aliases, and named character sequences constitute a single, unique namespace.


The Name property values for all but reserved code points will not be changed. The Name property values for reserved code points will change if a character is assigned to the code point. For more information, see the Unicode Encoding Stability Policies.


As corollary to this specification, it should be noted that the value of Field 1 (the string of characters between the semicolon separators) in UnicodeData.txt is the normative specification of the UCD Name property only for Graphic and Format characters other than ideographs and Hangul syllables. All other values which occur in Field 1 are labels that serve other functions in the generation of names lists and charts, or to label abbreviated ranges of property definitions, but do not constitute values of the Name property per se.


The term "character name" refers to the Name property value for an encoded character.


[[ In TUS 5.0, on page 79, after the existing definition D10 Code Point, insert the following new definitions. ]]


D10a Code Point Type: Any of the seven fundamental classes of code points in the standard: Graphic, Format, Control, Private-Use, Surrogate, Noncharacter, Reserved.



[[The current stability policy is:]]


Once a character is encoded, its character name will not be changed.


[[A request would be made to the officers to change it to be the following:]]


The Unicode Name property value for any non-reserved code point will not be changed. In particular, once a character is encoded its name will not be changed.