RE: Case mapping of dotless lowercase letters

From: Arcane Jill (arcanejill@ramonsky.com)
Date: Wed Dec 17 2003 - 08:30:55 EST

  • Next message: Michael Everson: "RE: [OT] CJK -> CJC (Re: Corea?)"

    Far be it from me to stir things up even further, but...

    QUESTION - Is the rendering of {U+0065} {U+0302} (that's <i, combining
    circumflex above>) locale-dependent?

    I may have got this totally wrong, but it occurs to me that in
    non-Turkic fonts, U+0065 is "soft-dotted". That is, the dot disappears
    in the presence of any COMBINING....ABOVE modifier. But in Turkic,
    U+0065 is "hard-dotted", so the dot must not be removed if a circumflex
    is added. I freely admit I don't know whether Turkic uses circumflex or
    not, but the question will work just as well with /any/
    COMBINING....ABOVE modifier.

    If this is so, how can a character be considered "soft-dotted" in one
    locale and "hard-dotted" in another?

    Would it not make more sense to have not two, but /three/ different
    kinds of lowercase i: <non-dotted i>, <soft-dotted i> and <hard-dotted
    i>?. (And similarly for uppercase). Of course, then you might as well
    invent COMBINING SOFT DOT ABOVE so we can use it elsewhere.

    It gets better. (You're gonna hate me). If we then make the set {
    soft-dotted-i, soft-dotted-I, non-dotted-i, non-dotted-I } a casefold
    equivalence class which lowercases to <soft-dotted-i> (except in the
    Turkic locale, where it lowercases to non-dotted-i), and uppercases to
    <non-dotted-I> in all locales; and if we similarly make { hard-dotted-i,
    hard-dotted-I } a separate casefold equivalence class lowercasing to
    <hard-dotted-i> and uppercasing to <hard-dotted-I> (in all locales),
    then all of the problems outlined by Philippe would go away. And we
    could do the same with j too.

    Of course - it would have one nasty side-effect. The Turks would then
    have to use <hard-dotted-i> instead of <soft-dotted-i>, but since the
    characters (in this new scheme) now have completely different meanings,
    that's fair enough. Hey ho.

    Just musing....
    Jill



    This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 09:21:36 EST