Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Wed, 23 May 2012 17:47:09 -0700

On Wed, May 23, 2012 at 5:17 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> > > Is there a definition of the precise
> > > relationship between DUCET and FractionalUCA.txt, or does
> > > FractionalUCA.txt define the relationship?
>
> > See http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.html
>
> As far as I can see, it just says they're different and gives some
> *principles* for changes.

FractionalUCA.txt used to (be supposed to) provide the same order as
allkeys.txt. I am only aware of one exception, the addition of prefix
contractions.

Other than that, a year or two ago CLDR started shuffling around some
symbols & punctuation characters so that those categories are separated
better in collation reordering. That should be it for intended ordering
differences.

And FractionalUCA.txt should agree with UCARules.txt.

For example, it doesn't mention the
> contractions 0FB2+0F71 and 0FB3+0F71. The text doesn't clearly say
> that all changes are identified. I haven't sat down to search for all
> the changes - in principle that's a 'hard' task, but in practice it should
> be
> possible to pick out a small residue for human inspection.
>

The order of code points and contractions as listed in FractionalUCA.txt
and allkeys.txt should be the same, except for intended differences. So if
you remove comments and anything after a semicolon and ignore white space,
then a simple file diff should show the ordering differences.

Also, I just saw that
http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.zipcontains
allkeys_CLDR.txt which should correspond 1:1 with the
FractionalUCA*.txt in the same .zip file.

One format difference: A couple of contractions with "middle dot" are
expressed in FractionalUCA.txt with a "prefix" (or "context") syntax, but
the relevant sequences should sort the same as their contractions in
allkeys.txt and in UCARules.txt. (This is to avoid the performance penalty
for the contraction starter.)

markus
Received on Wed May 23 2012 - 19:49:50 CDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 19:49:51 CDT