Re: CaseFirst and CaseLevel Tailorings of UCA and LDML from Markus Scherer on 2012-05-23 (Unicode Mail List Archive)

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Wed, 23 May 2012 15:50:24 -0700

On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> While we're picking on that poor routine - it looks as though it could
> come unstuck with kana in the supplementary planes - the Kana
> Supplement, and possibly also the Enclosed Ideographic Supplement. Do
> you want a comment on that added to the ticket, or does that issue
> deserve a whole ticket to itself?
>

I don't think we need another ticket, but I also don't know what you mean
with "it could come unstuck...". If we fix the code to handle all of
Unicode, it should be fine, and anyway it looks like the relevant
characters are all on the BMP. However, the detection seems to be missing
some characters, see http://bugs.icu-project.org/trac/ticket/9337#comment:4

Comment 2 in http://bugs.icu-project.org/trac/ticket/9337 seems to be
> the answer to my opening question - the case for caseFirst and
> caseLevel tailorings is defined, in the absence of non-parametric
> tailorings, by FractionalUCA.txt.

Yes, but I question whether that code is correct as well. In my opinion,
the case bits for FractionalUCA.txt and for ICU tailorings should use the
same algorithm, and we should document what that is. I will work with Mark
on this, but not too soon.

Is there a definition of the precise
> relationship between DUCET and FractionalUCA.txt, or does
> FractionalUCA.txt define the relationship?

See http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.html

I presume FractionalUCA.txt
> takes precedence over UCA_Rules.txt.

They are supposed to express the same order, but FractionalUCA.txt provides
more detailed data.

They do differ - the file
> FractionalUCA.txt assigns <U+0FB2, U+034F, U+0F71> and <U+0FB2, U+0F71>
> the same 3-level weights, but UCA_Rules.txt assigns them a tertiary
> difference. I've reported that in formal Unicode feedback.
>

Ok, thanks.

markus
Received on Wed May 23 2012 - 17:54:13 CDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 17:54:13 CDT