Proposal to UTC for comments on ISO 14651


Mark Davis



The proposal is for the Collation Ad Hoc committee to write comments for inclusion into the US comments and UTC communications to WG20 on the subject of ISO 14651. Because of the continuous changes to 14651 on the mailing list, the precise content of these comments is not available at the UTC meeting, but will be completed by the authors (Ken Whistler and Mark Davis).

However, the main goal is to ensure that major implementations of collation that currently produce satisfactory orderings for international character sets (POSIX, Java, Sybase, etc.) can be conformant to ISO 14651. In addition, the proposed Unicode Standard Collation Algorithm, which pays close attention to the special requirements of Unicode conformance, can be conformant to 14651. The main changes required of 14651 can be summarized as:

  1. levels: Conformant implementations are not required to support more than 3 levels. (They are free to support more than 3, but not required to.)
  2. position: Conformant implementations are not required to support the position designator. (They are free to support the position designator, but not required to.)
  3. backward: Conformant implementations are not required to support the backward designator at any level but level 2. Moreover, conformant implementations are not required to anything but a global backwards switch (e.g. that all weights at a particular level are either uniformally forward or backward). (They are free to support the multiple levels of backwards, and fine-grained directionality [on a per character basis], but not required to.)
  4. data: The default data for levels 1, 2, and 3 used by 14651 is consistent with the UCA data (though perhaps not in the same format). While the data resulting from tailoring in 14651 may not be "well-formed" as defined in UCA, the results must be the same as if it were.
  5. Unicode conformance: A conformant implementation must be able to be conformant with the Unicode Standard, including being able to do the following:
    1. treat canonical equivalent strings as precisely equal in ordering.
    2. perform Thai/Lao-style character reversal (see UCA Step 1).
    3. exclude irrelevant combining marks when looking up matches for contracting characters (see UCA Step 2).
  6. stability: No other changes proposed by other national bodies to 14651 that would substantially affect the current major implementations is acceptable to the US or Unicode Consortium.


  1. The fine-grained backwards designation currently in ISO 14651 must allow backwards, forwards, and neutral characters (the current specification, without neutral, does not work for multiple scripts--which is precisely what it is supposed to be for).