Background for PRI #310, New Character Property for Prepended Concatenation Marks

Prepended concatenation marks, also referred to as prefixed format control characters or, more generically, as subtending marks, are preposed marks that are used in combination with a following sequence of digits or letters which need to be formatted as a group to denote year, verse number, abbreviation, or other specialized semantics. An example of the subtending mark U+0601 ARABIC SIGN SANAH in combination with digits is illustrated in the following figure:

As of Unicode 8.0, there are 9 prepended concatenation marks:

U+0600 ARABIC NUMBER SIGN
U+0601 ARABIC SIGN SANAH
U+0602 ARABIC FOOTNOTE MARKER
U+0603 ARABIC SIGN SAFHA
U+0604 ARABIC SIGN SAMVAT
U+0605 ARABIC NUMBER MARK ABOVE
U+06DD ARABIC END OF AYAH
U+070F SYRIAC ABBREVIATION MARK
U+110BD KAITHI NUMBER SIGN

Prepended concatenation marks are unusual in that they precede a sequence of characters they interact with instead of following a single base character, as regular combining marks do. For that reason, prepended concatenation marks have General_Category=Format (gc=Cf) and are not combining marks (gc=M) as defined in the Unicode Standard. Proper display requires specialized rendering support: the glyph of a prepended concatenation mark extends under (subtending), over (supertending), or around (enclosing) the sequence of applicable characters that immediately follow it. U+0600 ARABIC NUMBER SIGN as well as U+0601 ARABIC SIGN SANAH illustrated in the previous figure are examples of subtending marks. U+070F SYRIAC ABBREVIATION MARK exemplifies the supertending marks. U+06DD ARABIC END OF AYAH is a typical example of prepended concatenation mark which encloses the digits following it.

Although they have distinctive characteristics, as of Unicode 8.0 there is no character property in the Unicode Character Database to designate prepended concatenation marks as a class of characters. Wherever there is a need to refer to or manipulate that class of characters, an explicit enumeration is employed, e.g., in the subsection Subtending Marks in the core specification of Unicode 8.0 or in the derivation of the Grapheme_Cluster_Break property value Prepend given in the Proposed Update UAX #29 for Unicode 9.0.

The proposal is to define a new character property in Unicode 9.0 to handle prepended concatenation marks collectively via properties rather than by hardcoded enumeration. The new property is a binary property, with the value Yes assigned to the 9 characters listed above, and the following main metaproperties:

Long name Prepended_Concatenation_Mark
Abbreviated name PCM
Type Binary
Scope of use Display and segmentation
Default value No (N)
Status Informative
Derivational status Not derivable