RE: Case mapping of dotless lowercase letters

From: Philippe Verdy ([email protected])
Date: Mon Dec 15 2003 - 19:12:55 EST

Next message: Doug Ewell: "Re: [OT] CJK -> CJC (Re: Corea?)"

Previous message: Benjamin Peterson: "Re: Swastika to be banned by Microsoft?"
In reply to: Markus Scherer: "Re: Case mapping of dotless lowercase letters"
Next in thread: Doug Ewell: "Re: Case mapping of dotless lowercase letters"
Reply: Doug Ewell: "Re: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Markus Scherer wrote:
> It still comes back to what Doug said: The default rules make
> sense for most languages, while in
> order to make sense for Turkic languages, you must use special
> rules for them. There is no way
> around it - it comes from the fact that they use the same letters
> in a different way.

You have not read: I'm not interested in the Turkic case, but in NON Turkic
languages, exactly with the default rule which:
- does not differentiate the dotted uppercase I and the undotted uppercase I
when casefolding them to the SAME soft-dotted lowercase i.
- but DOES differentiate the soft-dotted lowercase i and the dotless
lowercase i, despite the uppercase mapping will drop that difference!

This means, for non Turkic languages or in the locale-neutral environment,
that despite two characters are distinct when case folded, this difference
is not kept when converting to uppercase.

Such problem does not occur when using the Turkic case folding rules, so
Turkish and Azeri names don't have this problem with the lowercase dotless
i!

So it's a consistency problem; even for German we already have:
 LocaleNeutralFullCaseFolding(<Ess-Tsett>) =
 ;
 LowerCase(UpperCase(<Ess Tsett>)) =
 LowerCase(<Capital S, Capital S>) =
 ;
 Both results are equal as expected.
and:
 TurkicFullCaseFolding() =
 
 LowerCase(UpperCase()) =
 LowerCase(<Capital (dotless) I>) =
 
 Both results are equal as expected.
but:
 LocaleNeutralFullCaseFolding() =
 
 LowerCase(UpperCase()) =
 LowerCase(<Capital (dotless) I>) =
 
 Both results are unexpectedly different;

The last two results would be identical as expected, if we had a rule in
CaseFolding.txt so that:
LocaleNeutralFullCaseFolding() =

And this rule is possible without even breaking the current rules Turkic
languages by just adding these two lines in CaseFoldings.txt:

0130; F; 0069; # LATIN SMALL LETTER DOTLESS I
0130; T; 0130; # LATIN SMALL LETTER DOTLESS I

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Doug Ewell: "Re: [OT] CJK -> CJC (Re: Corea?)"
Previous message: Benjamin Peterson: "Re: Swastika to be banned by Microsoft?"
In reply to: Markus Scherer: "Re: Case mapping of dotless lowercase letters"
Next in thread: Doug Ewell: "Re: Case mapping of dotless lowercase letters"
Reply: Doug Ewell: "Re: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 20:29:48 EST