[Unicode]   Policies Home | Site Map | Search
 

About the Unicode Consortium Stability Policy

Unlike many other standards, the Unicode Standard is continually expanding: new characters are added to meet a variety of uses, ranging from technical symbols to letters for archaic languages. Character properties are also expanded or revised to meet implementation requirements. However, as the Unicode Standard becomes more widely deployed, changes to the standard must be constrained by the requirements of backward compatibility. To that end, the Unicode Consortium Character Encoding Stability Policy limits the ways in which the standards developed by the Unicode Consortium can change.

A primary requirement of stability is that the identity of the character remains unchanged in all future versions of the Unicode Standard. In other words, the same sequence of character codes continues to represent the same text. Consequently, character codes must never be changed, and the properties and behavior of a character must not change to the extent that it affects the identity of the character. In addition, code sequences, once normalized, must remain normalized.

Additional guarantees restricting possible changes may be added to enable implementers to make safe assumptions that allow more efficient and compact implementations. For example, by limiting the possible distinct values of the General Category, implementations may safely choose a packed format for representing them.

Character names are immutable, so that they can be used as constant external references to Unicode characters, in order to synchronize identifiers for characters among standards, in particular ISO standards.

In an ideal world, the information about existing Unicode Characters would be complete and correct at inception, so that maintenance of the standard would be purely additive. However, due to the large number of characters, each associated with many character properties, this ideal cannot be achieved. Despite best efforts, clerical errors are introduced in the publication process, but there are also cases where the initial information about a character may later prove incorrect or incomplete. In both cases, corrections may be required. If these can be made without implying a change to a character's identity, it is usually more beneficial to allow the change than to freeze the mistake, and the stability policy reflects that.

Even if a proposed correction is not prohibited by the stability policy, it must undergo an explicit approval process in the Unicode Technical Committee, including an analysis of its costs and benefits.

In cases where the stability policy prevents a change, the UTC may take one of several actions:
  • add additional characters with the desired properties and behavior
  • add additional properties
  • provide documentation of the un-modifiable mistake
  • add a character annotation or descriptive text in the standard.

Occasionally, two separately encoded characters may prove to be unintentional duplicates of each other. In such cases, stability prevents removal of the duplicate character as this would impact existing data using it. Instead, the character may be deprecated, which retains its definition and properties, but strongly discourages its usage.