L2/17-247

Title:   Proposed Property Changes for U+111C9 SHARADA SANDHI MARK
Authors: Ken Whistler, Laurențiu Iancu
Date:    July 26, 2017
Action:  For review by UTC

Background

The discussion in L2/17-153 regarding proposals to encode sandhi marks
for Bengali (L2/16-322) and Newa (L2/16-383) includes an analysis of
the behavior of the already encoded sandhi mark for the Sharada
script, U+111C9 SHARADA SANDHI MARK. That analysis suggested that
the current general category for U+111C9 (gc=Po) is incorrect.
For best implementation in rendering engines, and for consistency with
similar marks in other Indic scripts, U+111C9 should instead be
treated as a combining mark.

This proposal spells out the implications of that suggested property
change in detail, so that an explicit decision can be taken by the
UTC for Unicode 11.0.

In addition to the discussion in L2/17-153 (see Section 4), relevant
information and examples for U+111C9 can be found in L2/12-322 (the
proposal to encode the Sharada sandhi mark) and in L2/09-074 (the proposal
to encode the Sharada script).

Current Property Values for U+111C9

gc=Po
ccc=0
bc=L
lb=AL
Indic_Syllabic_Category=Other
Indic_Positional_Category=NA
Grapheme_Base=Y [derived]
Grapheme_Extend=N [derived]
Grapheme_Cluster_Break=Other
Word_Break=Other
Sentence_Break=Other
Case_Ignorable=N [derived]
ID_Continue=N [derived]
XID_Continue=N [derived]

Proposed Property Values for U+111C9

gc=Mn
ccc=0
bc=NSM
lb=CM
Indic_Syllabic_Category=Syllable_Modifier
Indic_Positional_Category=Bottom
Grapheme_Base=N [derived]
Grapheme_Extend=Y [derived]
Grapheme_Cluster_Break=Extend [derived]
Word_Break=Extend [derived]
Sentence_Break=Extend [derived]
Case_Ignorable=Y [derived]
ID_Continue=Y [derived]
XID_Continue=Y [derived]

Note that because of normalization stability guarantees, the ccc value cannot
be changed to ccc=220 (Below) pr ccc=222 (Below_Right), which otherwise might 
be the more natural choices here.

The general implication of the change from gc=Po to gc=Mn is that this
sandhi mark would become valid in identifiers. That could be prevented,
if desired, by explicitly excluding it from the identifier derivations,
but a transition from XID_Continue=N --> Y is an allowed change in general.

The changes for Bidi_Class and Line_Break and other segmentation
properties are just the expected changes
to keep them consistent with interpretation of U+111C9 as a combining
mark. This will change behavior a bit, but as this is a rare mark
in a historic script, those changes should not be considered breaking
changes at this point.

The treatment of the sandhi mark for ISC is not clear, although one possible
value is Syllable_Modifier, which is simply a holding category for
"miscellaneous combining characters that modify something in the orthographic 
syllable they succeed". The appropriate IPC value is probably Bottom.
We suggest experts in Indic syllable structure discuss these options, to
verify what choices would be most consistent for this case.