Re: Size of Weights in Unicode Collation Algorithm

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 14 Mar 2013 09:02:15 +0000

On Thu, 14 Mar 2013 00:19:15 +0000
"Whistler, Ken" <ken.whistler_at_sap.com> wrote:

> Richard Wordingham wrote:
>
> > > It loosened up the spec, so that the spec itself didn't seem to be
> > > requiring that each of the first 3 levels had to be expressed
> > > with a full 16 bits in any collation element table.
> >
> > I don't read it that way. But it did allow the 4th weight to go up
> > to 10FFFF! (Last explicit weight in DUCET 6.2.0 is 2A600.)
>
> Actually it didn't "allow" the 4th weight to do anything. The last
> explicit weight in DUCET 6.1.0 was already 2A600 for the 4th level.

DUCET 6.1.0 (and many earlier versions) did not comply with the
corresponding version of the Unicode Collation Algorithm. Indeed, if
an implementation were rash enough to use the DUCET 6.1.0 entries for
decomposable characters and used the 4th weight as an approximation to
a semi-stable sort, it would compare canonically equivalent strings as
unequal.

> > But I can no longer turn round and say that a 36 bit weight is
> > illegal.

> That standard never said that anyway.

It said a collation element was 'an ordered list of three or more
16-bit weights'. That rather excludes 36-bit weights in an input list
of weights. So long as the effects of the implementation are the same,
it can do whatever it likes with the weights internally.

> What is being corrected in the current text of the standard is
> separating the description of the format of DUCET, which *does* use 3
> 16-bit fields to record the 3 weights for each entry, from the
> logical description of tables and the algorithm, which does not
> depend on any particular bit size for the weight values.

But see what you wrote earlier:

> > > It might make sense to include a note somewhere to indicate that
> > > some aspects of the algorithm do implicitly assume that weights
> > > cannot exceed 16-bit values without requiring other adjustments
> > > to the algorithm.
> >
> > I'm listing them at the moment.
>
> O.k.

And submitted as feedback last night. I'm not sure if it will be
publicly available soon, as other feedback of mine on the UCA has
ceased to display amongst the feedback on PRI #235. I don't know if
it's been moved to a different feedback route, in which feedback
disappears from general view for months.

Richard.
Received on Thu Mar 14 2013 - 04:05:55 CDT

This archive was generated by hypermail 2.2.0 : Thu Mar 14 2013 - 04:06:01 CDT