US Comments on PDAM 1 to ISO/IEC 14651 - International sting ordering, Amendment #1
August 17, 2001
The US votes YES with comments.
GC1. Unicode 3.1 Repertoire. The industry has moved much more rapidly than anticipated to Unicode 3.1 / ISO/IEC 10646, Part 2. Because of this there is a large implementation demand to extend the collation algorithm to the full repertoire, including supplementary characters (those with code points between 0x10000 and 0x10FFFF). The earlier decisions to limit the repertoire handled by 14651 to the Unicode 3.0 repertoire have been overtaken by events, and it is best to extend the repertoire as soon as possible. The US thus requests that the repertoire covered in Table 1 be extended to cover the complete repertoire of Unicode 3.1.
The US is willing to supply draft weightings of these additional characters in an expanded CTT as a basis for this work.
TC1. U+ 1670 CANADIAN SYLLABICS NGAI. Based on contributions from Canadian Syllabics experts consulted by the US national body, the weight entry for this character needs to be moved so that U+1670 has a primary weight after U+158D and before U+158E, as shown here:
... <S158D> % CANADIAN SYLLABICS WEST-CREE RA <S1670> % CANADIAN SYLLABICS NGAI <S158E> % CANADIAN SYLLABICS NGAAI ...
TC2. Runes. The Runic section should be reordered to be in Fuşark order. Ordering in Swedish transliteration order is like ordering Ancient Greek by English transliteration order. Fuşark is the order of the original alphabet, and provides a neutral ordering which is not tied to any particular locale Runic transliteration system. Fuşark order is still widely used today by members of the general public using runes, and the expectation is that any ordering of runes be in Fuşark order. Most people encountering runes will be dependent on default ordering provided by their platforms, whereas experts will be in a better position to work with data tables with extra columns for transliterations. The US supports the Irish position regarding the details of the Fuşark order that should be in the CTT.
While running extensive stress tests and corner-case tests against the CTT, a number of consistency problems were encountered. The solutions to these problems described below have no effect on the normal use of characters, but establish the formal correctness of the standard. A separate document will be provided to WG20 that will describe these consistency problems in detail. The following technical comments represent changes required to fix these problems in the CTT.
TC3. Trailing Tertiaries. Add a tertiary symbol "<MAX>" which is always to be at the end of the tertiary symbol weight list. Currently this would occur in the following position:
... <SQUAREDCAP> <FRACTION> <MAX> % Second-level weight assignments <BASE> ...
In addition a collating symbol should be provided in the list of collating symbols:
... collating-symbol <FRACTION> collating-symbol <MAX> ...
In the weighting elements, certain characters (limited to a subset of those that at the tertiary level contain a sequence of non-min tertiary weights) should have the second and subsequent tertiary weights replaced by this new "<MAX>". Example:
<U2473> "<S0032><S0030>";"<BASE><BASE>";"<CIRCLE><CIRCLE>";<U2473> % CIRCLED NUMBER TWENTY
<U2473> "<S0032><S0030>";"<BASE><BASE>";"<CIRCLE><MAX>";<U2473> % CIRCLED NUMBER TWENTY
The precise list of characters that require this will be supplied in a separate document. The list will minimize the number of characters required to fix the consistency problem. It is a short list -- a small subset of the compatibility characters that have expansions.
TC4. Modify handling of secondaries for Numerics. These are to be weighted consistent with the approach used in other constructed secondaries (not involving an accent), such as in:
<U16AA> <S16A8>;"<BASE><VRNT1>";"<COMPAT><MIN>";<U16AA> % RUNIC LETTER AC A
Thus, the following example for a Mongolian digit
<U1811> <S0031>;<MONGL>;<MIN>;<U1811> % MONGOLIAN DIGIT ONE
<U1811> <S0031>;"<BASE><MONGL>";"<MIN><MIN>";<U1811> % MONGOLIAN DIGIT ONE
The list of numeric script secondary symbols to which this should be applied are the following:
<NEGATIVE> <SANSSERIF> <NEGSANSSERIF> <ARABIC> <EXTARABIC> <ETHPC> <NAGAR> <BENGL> <BENGALINUMERATOR> <GURMU> <GUJAR> <ORIYA> <TAMIL> <TELGU> <KNNDA> <MALAY> <THAII> <LAAOO> <BODKA> <MYANM> <KHMER> <MONGL> <CJKVS>