Re: UCA tertiary weight assignment vs. decomposition type definition in Unicode character database

From: Mark Davis ☕ <mark_at_macchiato.com>
Date: Fri, 27 Jan 2012 16:24:30 -0800

CLDR doesn't modify anything but primaries in the root ordering. Particular
languages may modify any of the levels, but I don't think anything is
typically done except for primary and secondary (with the exception of
Japanese, which is quite complicated).

Mark
*— Il meglio è l’inimico del bene —*
*
*
*
[https://plus.google.com/114199149796022210033]
*

On Fri, Jan 27, 2012 at 13:51, Ken Whistler <kenw_at_sybase.com> wrote:

> On 1/27/2012 1:16 PM, Matt Ma wrote:
>
>> Hi,
>>
>> There are a few characters having no decomposition type defined in
>> UnicodeData.txt, but they were assigned tertiary weight in
>> allkeys.text as if the characters had decomposition type. Here are a
>> few examples (version 6.0.0),
>>
>> ...
>>
>
> U+A733, U+A732, U+1F1E6 were given tertiary weight as they were
>> <compat>, while U+31B4 as it were<final>.
>>
>
> Yep, that is all done deliberately, to make the default sorting a bit more
> consistent.
> The normative decompositions in UnicodeData.txt are only the starting point
> for attempting to give more consistent default weights for collation.
>
>
>
>> Is this something documented outside of UCA?
>>
>
> No, because it is only relevant *to* UCA. At least as far as documentation
> written by the UTC is concerned.
>
> Well, I suppose it is also relevant to CLDR, because CLDR bases its
> collation
> tables on a tailoring of allkeys.txt from UCA. I don't know what
> documentation
> there may or may not be about the default treatment for tertiary weights
> in CLDR. Somebody involved in the details of CLDR collation will have
> to answer that one.
>
> --Ken
>
>
>
>
Received on Fri Jan 27 2012 - 18:30:18 CST

This archive was generated by hypermail 2.2.0 : Fri Jan 27 2012 - 18:30:20 CST