On 1/27/2012 1:16 PM, Matt Ma wrote:
> Hi,
>
> There are a few characters having no decomposition type defined in
> UnicodeData.txt, but they were assigned tertiary weight in
> allkeys.text as if the characters had decomposition type. Here are a
> few examples (version 6.0.0),
>
> ...
> U+A733, U+A732, U+1F1E6  were given tertiary weight as they were
> <compat>, while U+31B4 as it were<final>.
Yep, that is all done deliberately, to make the default sorting a bit 
more consistent.
The normative decompositions in UnicodeData.txt are only the starting point
for attempting to give more consistent default weights for collation.
>
> Is this something documented outside of UCA?
No, because it is only relevant *to* UCA. At least as far as documentation
written by the UTC is concerned.
Well, I suppose it is also relevant to CLDR, because CLDR bases its 
collation
tables on a tailoring of allkeys.txt from UCA. I don't know what 
documentation
there may or may not be about the default treatment for tertiary weights
in CLDR. Somebody involved in the details of CLDR collation will have
to answer that one.
--Ken
Received on Fri Jan 27 2012 - 15:55:37 CST
This archive was generated by hypermail 2.2.0 : Fri Jan 27 2012 - 15:55:39 CST