RE: Case mapping of dotless lowercase letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 17 2003 - 06:29:05 EST

Next message: Philippe Verdy: "RE: [OT] CJK -> CJC (Re: Corea?)"

Previous message: jon@hackcraft.net: "RE: [OT] CJK -> CJC (Re: Corea?)"
In reply to: Peter Kirk: "Re: Case mapping of dotless lowercase letters"
Next in thread: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Reply: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Peter Kirk wrote:
> This implies (since there are no decomposition exclusions) that NFD,
> used on Turkic text, violates the very sensible rule DO NOT USE
> COMBINING DOTS WITH I's, and leads to all sorts of potential confusion
> e.g. that both simple and full case folding and lowercasing applied to
> NFD Turkic text generate the nonsensical <i, dot above>. This could be a
> serious problem - although one that may not be worth fixing.

Yes NFD is an issue, but not a critical one, because the decomposition is
canonical, and not excluded from recomposition.

However you're wrong here: only Full CaseFolding generates <i, dot-above>
from <dotted-I>, not the default lowercase mapping in the UCD which is just
left unchanged, or the locale-specific "tr"/"az" lowercase mapping which
maps it to <(soft-dotted-)i>.

Typical Turkish and Azeri texts will not use <dot-above>, except in the NFD
form <I, dot-above> for <dotted-I>, which is just needed because of the Full
CaseFolding mapping to make it respect canonical equivalence.

I do hope that dotless-j and dotted-J will avoid these confusions, but not
trying to decompose dotted-J in the NFD form, and not generating <j,
dot-above> in Full CaseFolding of <dotted-J>, but just <(soft-dotted-)j>. Or
will it add more confusion there, if j is treated diffrently than i?

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: Philippe Verdy: "RE: [OT] CJK -> CJC (Re: Corea?)"
Previous message: jon@hackcraft.net: "RE: [OT] CJK -> CJC (Re: Corea?)"
In reply to: Peter Kirk: "Re: Case mapping of dotless lowercase letters"
Next in thread: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Reply: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 07:05:13 EST