PRI #281: Proposed encoding model change for New Tai Lue, Background Document

Last revision: 2014/08/29

This public review issue involves a proposed change to the rendering model for New Tai Lue from logical ordering, in which the pre-base vowels are stored after the initial consonant, to visual order, in which the pre-base vowels are stored before the initial consonant (as is the case for Thai, Lao and Tai Viet). See also document L2/14-090.

Background

The New Tai Lue script has been encoded in Unicode for nearly 10 years now (v4.1 2005), but it has seen very little use among the primary user community within China. Indeed, the only use from Chinese users in Xishuangbanna has been either using legacy encoded fonts or using a Unicode font where the behavior is such that the reordering characters are stored in visual order.

For example http://www.dw12.com is a news site using fonts with such an encoding.

Why is this an issue?

We are in a position now where users of New Tai Lue have to decide whether to stick with the Unicode Standard as currently specified (logical order) or follow the community in its modified encoding model (visual order). Users migrating from older systems with no dedicated support for New Tai Lue, to newer systems that support New Tai Lue encoding in logical order will find that all of their existing documents are broken.

Causes

For users of the New Tai Lue script, a primary usability concern is that they can interact with their text as if it were visually ordered. Thus pre-base vowels must be typeable before the consonant they visually precede. Other concerns are secondary, including correct sorting using simple sorting mechanisms without preprocessing. Since correct New Tai Lue sorting must take into account final consonants and tone marks, sorting based purely on logical order doesn’t produce the required results any more than visual order. Therefore, correct New Tai Lue sorting must resort to a complex sort algorithm. Consequently sorting should not be considered an argument for logically encoded New Tai Lue text.

Implications for Implementers

The cost of a change to the rendering model is that there are implementations in existence that use the logical order and they would have to change. This includes implementers of shaping engines or of keyboards that do visual to logical conversion.

Stability considerations

Unicode has policies governing changes that may be made to characters encoded in the standard. Relevant to this proposal are Identity Stability and Property Value Stability policies. See

http://www.unicode.org/policies/stability_policy.html#Identity
http://www.unicode.org/policies/stability_policy.html#Property_Value

The Identity Stability policy stipulates that fundamental character identity must not be changed. This proposal would not change the fundamental identity of any characters. It would, however, change the General Category property of certain characters. The Property Value Stability policy prohibits certain types of change to General Category properties, but does not prevent a change from Mc to Lo as proposed in this case.

This change would be destabilizing in relation to any existing New Tai Lue data that conforms to the currently-specified logical-order encoding model. The mitigating factor assumed in this proposal is that there is virtually no existing data that conforms to the currently-specified encoding model. Rather, evidence suggests that there is significant data using the proposed visual-order encoding model

Modifications to be made if the model is changed

If New Tai Lue were changed to be a visually ordered script, the following would need to be modified:

  1. Change the General Category of characters U+19B0 .. U+19C0, U+19C8 ..U+19C9 from Mc to Lo. (This would reflect the attitude of users that every character is just a letter.) The pre-base vowel characters U+19B5..U+19B7 and U+19BA will be given the property Logical_Order_Exception=Y.
  2. Collation. Tai Lue is typically sorted in the priority order of: Initial cluster, final consonant, vowel, tone. Supporting this in the DUCET would result in too many contractions. The DUCET currently gives Tai Lue a fallback sorting akin to Thai. In order to keep that same behavior 176 pre-base vowel + initial consonant contractions will be added to the table.

Questions for reviewers

The following are some specific points on which feedback is requested:

Feedback on any other aspect of the proposal is, of course, also welcomed.