L2/00-157

From: Karlsson Kent - keka [keka@im.se]
Sent: Tuesday, April 25, 2000 12:54 PM
To: 'mark.davis@us.ibm.com'
Cc: 'Kenneth Whistler'
Subject: RE: proposed changes in UTR#10: Collation

I see no reason for UTR 10 to have a very different notion of the comparisons defined, than those in 14651.

Suggested modified text:

The first weight is called the Level 1 weight (or primary weight), the second is called the Level 2 weight (secondary weight), and the third is called the Level 3 weight (tertiary weight). For a collation element X, these can be abbreviated as X1, X2, and X3. Given two collation elements X and Y, we will use the following notation:

 Notation Reading Meaning X =0 Y true X =1 Y X is primary equal to Y X =0 Y and X1 = Y1 i.e. X1 = Y1 X =2 Y X is secondary equal to Y X =1 Y and X2 = Y2 X =3 Y X is tertiary equal to Y X =2 Y and X3 = Y3 X =4 Y X is quarternary equal to Y X =3 Y and X4 = Y4

 Notation Reading Meaning X <0 Y false X <1 Y X is primary less than Y X <0 Y or (X =0 Y and X1 < Y1) i.e. X1 < Y1 X <2 Y X is secondary less than Y X <1 Y or (X =1 Y and X2 < Y2) X <3 Y X is tertiary less than Y X <2 Y or (X =2 Y and X3 < Y3) X <4 Y X is quarternary less than Y X <3 Y or (X =3 Y and X4 < Y4)

The collation algorithm results in a similar ordering among characters and strings, so that for two strings A and B we can write A <2 B, meaning that A is less than B and there is a secondary or primary difference between them. If A <2 B, but A =1 B, we say that there is only a secondary difference between them (which, however, implies that there is also a tertiary difference between them). If two strings have no primary, secondary or tertiary difference according to a given Collation Table, then we write A =3 B. If two strings are equivalent (equal at all levels) according to a given Collation Table, we write A = B. If they are bit-for-bit identical, we write A == B.

This makes all the orders defined total, and avoids the (incomplete) partial orders you defined before. This way one defines the orders that users are likely to be interested in, and the orders given by (e.g.) the Java collation API.

Kind regards

/kent k

=============================================================

Second message from Kent:

> Old
> <version> := <major>.<minor>.<variant> <eol>
> New
> @<version> := <major>.<minor>.<variant> <eol>

Do you mean:
<version> := @<major>.<minor>.<variant> <eol>

> 2. To allow for POSIX-style positions:
> ·     Change the term Shifted to ShiftLow throughout the document
> ·     Add ShiftHigh definition and examples.
> ·     ShiftHigh: The same as ShiftLow, except that all non-variable collation elements get
> a fourth-level weight equal to 0001.

That, however, is not how the POSIX “,position” option works.  (But it seems
that the major proponents of “,position” don’t know how it works either...)
The following text, from 14651, does describe how “,position” works, given the
informal descriptions given by the proponents of “,position”:

:Subkeys, at the last level, formed with the “forward,position” level
:processing parameter are formed by forming a subkey as with the “forward”
:parameter, but for collating elements that are not "IGNORE"d at all levels
:but the last one, their last level weighting (list of weights) is replaced
:by a single weight (call it <PLAIN> here) that is larger than all other
:weights at the last level in the given tailored table. Collating elements
:that are "IGNORE"d at all levels but the last one, retain their weighting
:according to the given tailored table. Finally, any trailing sequence of
:the maximal weight (<PLAIN>) is removed from the subkey, effectively
:replacing each trailing maximal weight with a zero weight.

Note that <PLAIN> is FFFF in UTR10.  So in essence, and from a UTR10
perspective, ",position" is the same as "Shifted", but with the added
twist of removing any trailing sequence of FFFF weights.

Rather than 1) make a false statement about “,position” operation (like your
suggestion), or 2) make a correct statement about “,position” (like what 14651
says), I’d prefer 3) forget about “,position”, since it does not bring any
tangible advantages, and is frequently misinterpreted. Support for it is
NOT required by 14651, and when it is not supported but asked for (if that
is possible in the syntax used) it is to be interpreted in the same way