RE: Case mapping of dotless lowercase letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 17 2003 - 06:29:05 EST

  • Next message: Philippe Verdy: "RE: [OT] CJK -> CJC (Re: Corea?)"

    Peter Kirk wrote:
    > This implies (since there are no decomposition exclusions) that NFD,
    > used on Turkic text, violates the very sensible rule DO NOT USE
    > COMBINING DOTS WITH I's, and leads to all sorts of potential confusion
    > e.g. that both simple and full case folding and lowercasing applied to
    > NFD Turkic text generate the nonsensical <i, dot above>. This could be a
    > serious problem - although one that may not be worth fixing.

    Yes NFD is an issue, but not a critical one, because the decomposition is
    canonical, and not excluded from recomposition.

    However you're wrong here: only Full CaseFolding generates <i, dot-above>
    from <dotted-I>, not the default lowercase mapping in the UCD which is just
    left unchanged, or the locale-specific "tr"/"az" lowercase mapping which
    maps it to <(soft-dotted-)i>.

    Typical Turkish and Azeri texts will not use <dot-above>, except in the NFD
    form <I, dot-above> for <dotted-I>, which is just needed because of the Full
    CaseFolding mapping to make it respect canonical equivalence.

    I do hope that dotless-j and dotted-J will avoid these confusions, but not
    trying to decompose dotted-J in the NFD form, and not generating <j,
    dot-above> in Full CaseFolding of <dotted-J>, but just <(soft-dotted-)j>. Or
    will it add more confusion there, if j is treated diffrently than i?

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 07:05:13 EST