From: Markus Scherer <markus.icu_at_gmail.com>

Date: Wed, 13 Mar 2013 13:22:01 -0700

Date: Wed, 13 Mar 2013 13:22:01 -0700

On Wed, Mar 13, 2013 at 11:38 AM, Richard Wordingham <

richard.wordingham_at_ntlworld.com> wrote:

*> One of the changes from Version 6.1.0 to 6.2.0 of the the UCA (UTS#10)
*

*> was to changed weights from being 16 bits to just being general
*

*> non-negative integers. Was this just to accommodate the 4th weight in
*

*> DUCET (scheduled for deletion in Version 6.3.0), or is it intended to do
*

*> away with the inconvenient concept of 'large weights'?
*

*>
*

Neither. It's because the algorithm has very little to do with how exactly

the weights are stored. For example, ICU logically stores weights as

sequences of 1, 2, 3 or 4 bytes, with collation elements encoded in

interesting ways so that most CEs fit into 32-bit integers.

Previously, each of the four weights could be accommodated in 16, 16,

*> 16 and 24 bits. How many bits may be needed for a DUCET collation
*

*> element now?
*

There is no plan to change how the DUCET is expressed, nor how the weight

examples are written in the UCA spec.

While the algorithm does not depend on the particular weight size, nor on

the particular weight values, it would be hard and confusing to fully write

the spec without ever using concrete numeric examples.

Are we threatened with having to accommodate 36 bit

*> weights?
*

*>
*

Data structure design is up to each implementation.

markus

Received on Wed Mar 13 2013 - 15:27:05 CDT

*
This archive was generated by hypermail 2.2.0
: Wed Mar 13 2013 - 15:27:07 CDT
*