From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 17 2003 - 06:29:05 EST
Peter Kirk wrote:
> This implies (since there are no decomposition exclusions) that NFD,
> used on Turkic text, violates the very sensible rule DO NOT USE
> COMBINING DOTS WITH I's, and leads to all sorts of potential confusion
> e.g. that both simple and full case folding and lowercasing applied to
> NFD Turkic text generate the nonsensical <i, dot above>. This could be a
> serious problem - although one that may not be worth fixing.
Yes NFD is an issue, but not a critical one, because the decomposition is
canonical, and not excluded from recomposition.
However you're wrong here: only Full CaseFolding generates <i, dot-above>
from <dotted-I>, not the default lowercase mapping in the UCD which is just
left unchanged, or the locale-specific "tr"/"az" lowercase mapping which
maps it to <(soft-dotted-)i>.
Typical Turkish and Azeri texts will not use <dot-above>, except in the NFD
form <I, dot-above> for <dotted-I>, which is just needed because of the Full
CaseFolding mapping to make it respect canonical equivalence.
I do hope that dotless-j and dotted-J will avoid these confusions, but not
trying to decompose dotted-J in the NFD form, and not generating <j,
dot-above> in Full CaseFolding of <dotted-J>, but just <(soft-dotted-)j>. Or
will it add more confusion there, if j is treated diffrently than i?
__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com
This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 07:05:13 EST