RE: UCD 3.1, Final Beta - Case folding

From: Carl W. Brown (cbrown@xnetinc.com)
Date: Mon Mar 05 2001 - 14:44:38 EST


-----Original Message-----
From: Antoine Leca [mailto:Antoine.Leca@renault.fr]
Sent: Monday, March 05, 2001 9:57 AM
To: Unicode List
Cc: Unicode List
Subject: Re: UCD 3.1, Final Beta - Case folding

>Carl W. Brown wrote:
>>
>> I noticed that there is no mention of the casing special case:
>>
>> # Lithuanian
>>
>> 0307; 0307; ; ; lt AFTER_i; # Remove DOT ABOVE after "i" with upper or
>> titlecase
>>
>> The case folding is locale-less so it seems to me the it is probably
better
>> to remove the COMBINING DOT ABOVE after all 'i' / 'I' regardless of
locale
>> to make it work for Lithuanian. I doubt that this will case serious
>> problems with caseless compares for other locales.

>I think the 'I' above is a typo, isn't it? You meant 'j', don't you?

I do mean 'i' not 'j'.

>If not, please consider a Turkish text, fully decomposed: there, a
dot_above
>U+0307 following an uppercase I U+0049 should certainly *not* be dropped.

This works for Turkish as well. Case folding folds dotted and dotless i
into 'i'.

0049; C; 0069; # LATIN CAPITAL LETTER I
0130; I; 0069; # LATIN CAPITAL LETTER I WITH DOT ABOVE
0131; I; 0069; # LATIN SMALL LETTER DOTLESS I

By removing the COMBINING DOT ABOVE, the fully decomposed text will match
the composed text and therefore be a better representation of case folding.

>Antoine



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT