Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 24 May 2012 01:17:03 +0100

On Wed, 23 May 2012 15:50:24 -0700
Markus Scherer <markus.icu_at_gmail.com> wrote:

> On Wed, May 23, 2012 at 2:01 PM, Richard Wordingham <
> richard.wordingham_at_ntlworld.com> wrote:
>
> > While we're picking on that poor routine - it looks as though it
> > could come unstuck with kana in the supplementary planes - the Kana
> > Supplement, and possibly also the Enclosed Ideographic Supplement.
> > Do you want a comment on that added to the ticket, or does that
> > issue deserve a whole ticket to itself?
> >
>
> I don't think we need another ticket, but I also don't know what you
> mean with "it could come unstuck...".

I was worrying that the kana conversion routines would write whole
characters to the destination strings - both source and destination
are specified as being a single code unit. Since I last looked at
the ticket, you've picked up the issue of code unit v. character, so if
you did preserve the kana conversion logic you would note the issue with
destination sizes.

> > Is there a definition of the precise
> > relationship between DUCET and FractionalUCA.txt, or does
> > FractionalUCA.txt define the relationship?

> See http://www.unicode.org/Public/UCA/latest/CollationAuxiliary.html

As far as I can see, it just says they're different and gives some
*principles* for changes. For example, it doesn't mention the
contractions 0FB2+0F71 and 0FB3+0F71. The text doesn't clearly say
that all changes are identified. I haven't sat down to search for all
the changes - in principle that's a 'hard' task, but in practice it should be
possible to pick out a small residue for human inspection.

Richard.
Received on Wed May 23 2012 - 19:19:16 CDT

This archive was generated by hypermail 2.2.0 : Wed May 23 2012 - 19:19:16 CDT