[Unicode]   Collation Home | Site Map | Search

Change Management for the Unicode Collation Algorithm

As implementations of the Unicode Collation Algorithm become more widespread, stability and reliability of the UCA data table has become more important. To ensure this, the UTC has approved some constraints on allowed changes, and has established a more explicit process for tracking and implementing actual changes to the Default Unicode Collation Element Table (DUCET) between releases of UCA.

Constraints on Changes to the Default Unicode Collation Element Table

1. Changes for characters which have been in the standard for longer than 2 years should generally be disallowed. The UTC can overrule this and mandate a change in a character weight entry, but should only do so when it determines that there is an egregious error or finds some other very strong motivation for disturbing an established value. In less than such extreme circumstances, solutions involving tailoring should be preferred.

2. When a character weight has been published in UCA, but for less than two years, any proposed change should be weighed against the viability of a tailoring alternative, with a presumption being for no change to DUCET, all things being equal. This should be used to constrain against "tidying up" proposals that disturb the table but which don't demonstrate clear superiority to what already exists.

3. Exceptions to points 1 or 2 may be appropriate in order to maintain synchronization with ISO/IEC 14651, but efforts should be made in WG2 to ensure that destabilizing changes to 14651 are minimized as well.

4. The 2 year limit for point 1 may be relaxed in cases where changes are proposed for weights for symbols, punctuation, and format controls, if substantial reasons are provided for such changes. This results from the fact that that such characters are usually ignored in most collation, and there are few well-established rules for their ordering; hence changes for their weights are less likely to disturb the ordering of existing data or disrupt existing tailorings. Such changes also do not destabilize ISO/IEC 14651, because such characters are weighted in the 14651 Common Tailorable Template (CTT) table as ignorables.

5. All reviewers should concentrate efforts during the review of beta tables for extension to UCA before a new version of the standard is published, to minimize the need to make fixes after the fact that might run counter to the principles 1 or 2.

6. The beta UCA tables and UCD tables should, if possible, be issued during the same period to allow for sufficient review of weights provided for the new characters.

Clarity in Specification of Changes to DUCET

1. Any proposed change to existing DUCET entries should be specified in the tailoring syntax used by CLDR. In this way they are more likely to be well-formed and unambiguous.

2. Any UTC-mandated change to DUCET will be reviewed by the editorial committee during the process of implementing it into the actual DUCET for the next revision of the UCA standard. A proposed change may turn out not to be well-formed and unambiguous, or have ramifications in the table that were not obvious when the change was proposed (such as an oversight regarding parallel treatment of a weight change in a related script). If the data is not final — that is, there is a an intervening UTC meeting before the UCA release is to be made — the editorial committee is authorized to make changes in the draft files that in its judgment, would be most consistent with the goals and decisions of the UTC, but should report this issue both in the PRI text associated with the public review of the change to the table, and in its report back to the UTC.

3. In the case of problematical changes as noted in 2, if there is not sufficient time for a UTC decision before the next mandated issuance of an update to UCA, the editorial committee should complete the release of UCA without the problematical change, so as not to hold up the release. The presumption should be that such problematic changes need further discussion and resolution by the UTC, and the default action should then be to omit until clarified, rather than incorporating problematical changes into the table which may have to be retracted in a following release.

Tracking Proposed Changes to DUCET

1. In order not to lose track of proposed changes to DUCET, each significant proposed change to the existing table should be tracked using the CLDR bug-tracking process. (This is appropriate, in part because changes to DUCET may result in a requirement for further, cascading changes to existing CLDR tailoring tables for collation.)

2. Before release of a new version of DUCET, the CLDR bug database should then be reviewed to ensure that each currently open bug on entries in the DUCET table is either:

  • Fixed in the table (and marked closed);
  • Not fixed in the table (and marked closed); or
  • Not fixed in the table (and postponed to a future resolution, with an explicit indication for the bug that it is not fixed in the current release of the table).

3. Independently, implementers may submit bugs on collation tables to the CLDR-TC. In many instances, such bugs will simply result in changes to existing language-specific collation table tailorings. But if, in the judgement of the CLDR-TC, such a bug reveals a problem in the DUCET itself, the CLDR-TC should file an appropriate bug in the CLDR bug database regarding DUCET and bring the issue to the attention of the UTC for resolution.