RE: Case mapping of dotless lowercase letters

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Dec 15 2003 - 21:20:36 EST

  • Next message: Jungshik Shin: "Re: [OT] CJK -> CJC (Re: Corea?)"

    Doug Ewell wrote:
    > Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    > > You have not read: I'm not interested in the Turkic case, but in NON
    > > Turkic languages, exactly with the default rule which:
    > > - does not differentiate the dotted uppercase I and the undotted
    > > uppercase I when casefolding them to the SAME soft-dotted lowercase i.
    > > - but DOES differentiate the soft-dotted lowercase i and the dotless
    > > lowercase i, despite the uppercase mapping will drop that difference!
    >
    > There may be a problem here, but the urgency seems very slight;

    I detected it after it produced a security bug (a user record was
    unexpectedly updated on my database...)

    > you'll probably never find dotted uppercase I

    right.

    > and dotless lowercase i in non-Turkic languages.

    Wrong here: I have found occurences of dotless lowercase i, used instead of
    soft-dotted lowercase i, as base letters for diacritics added above it (it
    was an accute accent...)

    There was two sequences which looked apparently identical when rendered, and
    that were distinct after case folding compare check:

    (1) LATIN SMALL LETTER I, COMBINING ACCUTE ACCENT
    (2) LATIN SMALL LETTER DOTLESS I, COMBINING ACCUTE ACCENT

    but were no more distinct when converted to uppercase in a locale neutral
    environment not using the Turkic rules:

    (1') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT
    (2') LATIN CAPITAL LETTER I, COMBINING ACCUTE ACCENT

    The string (2) may have been produced to avoid displaying the dot with some
    fonts that don't apply the soft-dotted rule when there's an additional
    diacritic above...

    For me, strings (1) and (2) are "equivalent" in non-Turkic locale-neutral
    environments, and should be equal with case-insensitive compares, exactly
    like for (1') and (2'), their uppercase equivalent.

    __________________________________________________________________
    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE! http://www.ellaforspam.com





    This archive was generated by hypermail 2.1.5 : Mon Dec 15 2003 - 22:03:14 EST