Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Markus Scherer <>
Date: Mon, 21 May 2012 17:07:33 -0700

On Mon, May 21, 2012 at 4:37 PM, Richard Wordingham <> wrote:

> What are the definitions of upper and lower case for the caseFirst
> tailoring for the UCA and for LDML? I can't find any obvious
> definition.

I am having trouble finding a published definition too. I suggest you
submit a CLDR ticket for this.

In principle, it's straightforward: Lowercase and uppercase follow Unicode
(UCD) case properties. We distinguish an intermediate "mixed case" for
titlecase characters and mixed-case contractions. I believe we also
distinguish small/normal Kana as lowercase/uppercase. I can dig up the ICU
code that computes the collation case bits for a string.

I don't know whether CLDR/LDML should require all of the details, but there
should at least be informative documentation.

When you turn on the case level or use a caseFirst option, these case bits
are used before (or instead of) the tertiary weights. When you use "normal"
3-level sorting, the case bits are ignored and only the tertiary weights
are used.

The tertiary weights themselves are separate, and based on a mix of

Best regards,

Google Internationalization Engineering
Received on Mon May 21 2012 - 19:09:31 CDT

This archive was generated by hypermail 2.2.0 : Mon May 21 2012 - 19:09:32 CDT