L2/15-183R Title: Candidate characters for Grapheme_Cluster_Break=Prepend Author: Roozbeh Pournader (Google) Date: July 27, 2015 Action: For consideration by the UTC UAX #29 Unicode Text Segmentation has been supporting a Prepend class of characters, presently with no members (it used to contain some Southeast Asian characters). The author is proposing that the following characters to be added to the class: Group A: Subtending marks U+0600 ARABIC NUMBER SIGN U+0601 ARABIC SIGN SANAH U+0602 ARABIC FOOTNOTE MARKER U+0603 ARABIC SIGN SAFHA U+0604 ARABIC SIGN SAMVAT U+0605 ARABIC NUMBER MARK ABOVE U+06DD ARABIC END OF AYAH U+070F SYRIAC ABBREVIATION MARK U+110BD KAITHI NUMBER SIGN (The ARABIC SIYAQ NUMBER MARK, proposed in L2/15-074, would also fall into this group.) Group B: Indic cluster-initial consonants U+0D4E MALAYALAM LETTER DOT REPH U+111C2 SHARADA SIGN JIHVAMULIYA U+111C3 SHARADA SIGN UPADHMANIYA (These are all the characters with InSC=Consonant_Prefixed or InSC=Consonant_Preceding_Repha. The UTC-approved Soyombo characters U+11A84..11A87 SOYOMBO CLUSTER-INITIAL LETTER LA..SOYOMBO CLUSTER-INITIAL LETTER RA would also fall into this group.) Rationale: This is because all the characters above attach to the character or characters immediately after them in a rather unseparable way (typically subtending or enclosing them), in a way that there should not be a grapheme break between them and the character immediately after them. In this way, they are similar to various combining marks, such as U+20DD COMBINING ENCLOSING CIRCLE or U+0332 COMBINING LOW LINE that form a grapheme cluster unit with a base character. The difference is that the base character follows the above 13 characters, instead of preceding them.