Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Tue, 22 May 2012 08:33:43 -0700

On Tue, May 22, 2012 at 1:09 AM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> On Mon, 21 May 2012 17:07:33 -0700
> Markus Scherer <markus.icu_at_gmail.com> wrote:
>
> > In principle, it's straightforward: Lowercase and uppercase follow
> > Unicode (UCD) case properties. We distinguish an intermediate "mixed
> > case" for titlecase characters and mixed-case contractions. I believe
> > we also distinguish small/normal Kana as lowercase/uppercase. I can
> > dig up the ICU code that computes the collation case bits for a
> > string.
>
> Is this code in ICU 4.4.2 (the version for the Linux I run), or should
> I be looking at ICU 49?
>

That code is in every version of ICU since we implemented the current
collation implementation. I bet that part of the collation builder code has
not changed significantly since ICU 1.8 in 2001... I will try to look for
it today or tomorrow.

> I don't know whether CLDR/LDML should require all of the details, but
> > there should at least be informative documentation.
>
> If they are to define collation, they have to define how the order
> results from the tailoring. Of course, it can be done by reference,
> but while saying 'as in UCA' is entirely appropriate where the UCA is
> adequately defined (some tailorings clearly are not, and work is under
> way to fix some of these shortfalls), I am uneasy at 'as in ICU'.
>

CLDR does not publish precise conformance tests for attributes and
tailorings. I think it's fair to say that a particular attribute results in
"lower case sorting before upper case" or similar without spelling out
precisely how edge cases might behave. In my opinion, we should give a
little bit of wriggle room to implementations.

However, I think CLDR should also give at least informational guidance
about what this might or should mean. We should definitely say that when
one of the case options is used, the case information trumps the regular
tertiary weights.

Also, the ICU User Guide should document what it does mean in ICU.

markus
Received on Tue May 22 2012 - 10:38:24 CDT

This archive was generated by hypermail 2.2.0 : Tue May 22 2012 - 10:38:25 CDT