Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Fri, 25 May 2012 01:36:03 +0100

On Wed, 23 May 2012 17:47:09 -0700
Markus Scherer <markus.icu_at_gmail.com> wrote:

> Also, I just saw that
> http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.zipcontains
> allkeys_CLDR.txt which should correspond 1:1 with the
> FractionalUCA*.txt in the same .zip file.

> One format difference: <snip>

I spotted two differences flicking through the end of the differences -
DUCET allkeys.txt gives the same 4th level weight to U+2FA6, U+328E and
U+F90A, although U+91D1 is only a compatibility decomposition of the
first two. By contrast, allkeys_CLDR.txt follows the documented
process of setting the 4th level weight according to the canonical
decomposition. This pattern seems to repeated throughout the CJK
characters.

The second difference is again at 4th level - allkeys_CLDR.txt gives
different 4th level weights to the canonically equivalent U+1B40 and
<U+1B3E, U+1B35>, which is wrong. It's as though DUCET and the root
locale collation were generated from imperfectly aligned programs rather
than one being derived from the other.

As ICU does not load the 4th level weight, it will be shielded from
these issues.

Richard.
Received on Thu May 24 2012 - 19:43:45 CDT

This archive was generated by hypermail 2.2.0 : Thu May 24 2012 - 19:43:58 CDT