Some areas of the Unicode code point space have been earmarked for RTL scripts, and the value of the Bidi_Class for unassigned code points in those areas has been set accordingly. The idea is to anticipate the value that a property will have once the code point is assigned to an abstract character, and “to maximize compatibility with expected future assignments” (TUS 5.0 p156). This anticipation is not binding in any way - the property value can be changed at the time of assignment (or possibly later). It is nevertheless useful, as it increases the likelyhood of a smooth transition when the code point is assigned; at that point, data using the new code points will meet “older” implementations.
In the same vein, we strongly expect that future assignments in the various blocks for CJK ideographs as well as in the rest of plane 2 (SIP) will be for CJK ideographs, and this proposal is to set the value of some properties for the unassigned code points accordingly.
The concerned areas and unassigned code points as of Unicode 5.1 are:
|Block name||Block range||Unassigned code points|
|CJK Unified Ideographs Extension A||3400-4DBF||4DB6-4DBF|
|CJK Unified Ideographs||4E00-9FFF||9FC4-9FFF|
|CJK Compatibility Ideographs||F900-FAFF||FA2E-FA2F|
|CJK Unified Ideographs Extension B||20000-2A6DF||2A6D7-2A6DF|
|(SIP outside blocks)||2A6E0-2F7FF|
|CJK Compatibility Ideographs Supplement||2F800-2FA1F||2FA1E-2FA1F|
|(SIP outside blocks)||2FA20-2FFFD|
Should this proposal be adopted for some version of Unicode, code points that would become assigned by that version would be excluded from this proposal (and just get property values as part of the normal process of encoding).
The potential properties of interest are those which can be predicted accurately, and where the current assignment is different from the expected value:
|short name||long name||assigned||unassigned||proposed|
The properties East_Asian_Width and Line_Break describe the behavior of characters in rendering; predicting those two properties in particular would improve significantly the rendering of text when characters are assigned. The other properties are more about the identity of the characters, and while the prediction could be acurate, assigning predicted values for unassigned characters may be misleading, and adversely affect invariants.
The proposal is to assign East_Asian_Width = W and Line_Break = ID to the unassigned code points in the first table.