L2/17-247 Title: Proposed Property Changes for U+111C9 SHARADA SANDHI MARK Authors: Ken Whistler, Laurențiu Iancu Date: July 26, 2017 Action: For review by UTC Background The discussion in L2/17-153 regarding proposals to encode sandhi marks for Bengali (L2/16-322) and Newa (L2/16-383) includes an analysis of the behavior of the already encoded sandhi mark for the Sharada script, U+111C9 SHARADA SANDHI MARK. That analysis suggested that the current general category for U+111C9 (gc=Po) is incorrect. For best implementation in rendering engines, and for consistency with similar marks in other Indic scripts, U+111C9 should instead be treated as a combining mark. This proposal spells out the implications of that suggested property change in detail, so that an explicit decision can be taken by the UTC for Unicode 11.0. In addition to the discussion in L2/17-153 (see Section 4), relevant information and examples for U+111C9 can be found in L2/12-322 (the proposal to encode the Sharada sandhi mark) and in L2/09-074 (the proposal to encode the Sharada script). Current Property Values for U+111C9 gc=Po ccc=0 bc=L lb=AL Indic_Syllabic_Category=Other Indic_Positional_Category=NA Grapheme_Base=Y [derived] Grapheme_Extend=N [derived] Grapheme_Cluster_Break=Other Word_Break=Other Sentence_Break=Other Case_Ignorable=N [derived] ID_Continue=N [derived] XID_Continue=N [derived] Proposed Property Values for U+111C9 gc=Mn ccc=0 bc=NSM lb=CM Indic_Syllabic_Category=Syllable_Modifier Indic_Positional_Category=Bottom Grapheme_Base=N [derived] Grapheme_Extend=Y [derived] Grapheme_Cluster_Break=Extend [derived] Word_Break=Extend [derived] Sentence_Break=Extend [derived] Case_Ignorable=Y [derived] ID_Continue=Y [derived] XID_Continue=Y [derived] Note that because of normalization stability guarantees, the ccc value cannot be changed to ccc=220 (Below) pr ccc=222 (Below_Right), which otherwise might be the more natural choices here. The general implication of the change from gc=Po to gc=Mn is that this sandhi mark would become valid in identifiers. That could be prevented, if desired, by explicitly excluding it from the identifier derivations, but a transition from XID_Continue=N --> Y is an allowed change in general. The changes for Bidi_Class and Line_Break and other segmentation properties are just the expected changes to keep them consistent with interpretation of U+111C9 as a combining mark. This will change behavior a bit, but as this is a rare mark in a historic script, those changes should not be considered breaking changes at this point. The treatment of the sandhi mark for ISC is not clear, although one possible value is Syllable_Modifier, which is simply a holding category for "miscellaneous combining characters that modify something in the orthographic syllable they succeed". The appropriate IPC value is probably Bottom. We suggest experts in Indic syllable structure discuss these options, to verify what choices would be most consistent for this case.