Re: CaseFirst and CaseLevel Tailorings of UCA and LDML

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Tue, 22 May 2012 04:08:42 +0100

On Mon, 21 May 2012 17:43:27 -0700
Ken Whistler <kenw_at_sybase.com> wrote:

> > For example, when caseFirst is set to
> > uppercase, ICU orders U+1D34 MODIFIER LETTER CAPITAL H before
> > U+0068 LATIN SMALL LETTER H, but anomalously order U+A7F8 MODIFIER
> > LETTER CAPITAL H WITH STROKE*after* U+0127 LATIN SMALL LETTER H
> > WITH STROKE becaue the latter's tertiary weight identifies it
> > as<super> with no entry for 'Case or kana subtype' class. Is this
> > behaviour required by the UCA + DUCET?
 
> Well, that may be a bug in allkeys.txt.

But, given allkeys.txt as it is, is it required behaviour? You sound
quite unenthusiastic about fixing the arguable bug in allkeys.txt.

> The default
> tertiary weights aren't completely separated into all the possible
> combinations
> here, because the required weighting space gets out of hand, and seems
> unnecessary for the edge cases for compatibility characters, at least
> for *default* weighting of such.

It's still a bit of a shock to find 13 characters with 3-level weights
[.1699.0020.0005], all potentially used in the same 'language' -
Mathematics.

It's not just characters with compatibility decompositions. U+A669
CYRILLIC SMALL LETTER MONOCULAR O, U+A66B CYRILLIC SMALL LETTER
BINOCULAR O, U+A66D CYRILLIC SMALL LETTER DOUBLE MONOCULAR O and U+A66E
CYRILLIC LETTER MULTIOCULAR O, which do not have decompositions, are all
sharing the <compat> tertiary weight.

> If even in
> *those* circumstances, somebody required uppercase-first tailoring
> to work without exception for U+A7F8, well, then the solution for
> that is simply to tailor the default tertiary weight from 0014 to
> 001D.

How would one do that through LDML?

The obvious hack if one is committed to uppercase-first is
"&\u0126<<<\ua7f8", but that doesn't work in the ICU demonstrator.
(It maroons U+A7F8 amongst the tertiary variants of plain 'h'.) I
think this is related to a known problem area relating to contractions
and expansions, but the LDML documentation leaves me mystified rather
than explaining what to do.

Richard.
Received on Mon May 21 2012 - 22:11:01 CDT

This archive was generated by hypermail 2.2.0 : Mon May 21 2012 - 22:11:06 CDT