Re: Case mapping of dotless lowercase letters

From: Peter Kirk (peterkirk@qaya.org)
Date: Wed Dec 17 2003 - 14:51:23 EST

  • Next message: Philippe Verdy: "RE: Case mapping of dotless lowercase letters"

    On 17/12/2003 11:29, Philippe Verdy wrote:

    >Peter Kirk wrote:
    >
    >
    >>Conclusion: the right thing even for Turkish is to drop the dot on i
    >>before a circumflex.
    >>
    >>
    >
    >I agree. The letter is rare enough to not create an exception here for
    >the removal of dot on the soft-dotted i followed by circumflex (which
    >is needed much more often in other languages that use 'î' and Î'.
    >
    >
    >
    I'm not sure that rarity is a good argument, but we agree on the conclusion.

    >>But by the same argument we would also want to drop
    >>the dot on dotless I.
    >>
    >>
    >
    >I think you meant "But by the same argument we would also want to drop
    >the dot on DOTTED I". I would not recommand it, this would make things
    >even worse and more complicated.
    >
    >
    >
    Indeed. Thank you for correcting my error.

    >If Turkish wants to remove the dot on "pseudo-dotted" I if followed by
    >a circumflex, the correct thing to do is then to use the ASCII dotless
    >I and add a circumflex or use its canonical equivalent
    ><LATIN CAPITAL LETTER I WITH CIRCUMFLEX>.
    >
    >With the current specification, both of
    > <LATIN CAPITAL LETTER I, COMBINING CIRCUMFLEX>, and
    > <LATIN CAPITAL LETTER I WITH CIRCUMFLEX>
    >are canonical equivalents and must render the same, without the dot.
    >
    >To display a dot, one can use one of the four canonical eqquivalents:
    > <LATIN CAPITAL LETTER I WITH DOT ABOVE, COMBINING CIRCUMFLEX>
    > <LATIN CAPITAL LETTER I WITH CIRCUMFLEX, COMBINING DOT ABOVE>
    > <LATIN CAPITAL LETTER I, COMBINING DOT ABOVE, COMBINING CIRCUMFLEX>
    > <LATIN CAPITAL LETTER I, COMBINING CIRCUMFLEX, COMBINING DOT ABOVE>
    >(one is the NFC form, another is the NFD form, two others are also
    >possible)
    >
    >
    >
    The problem which might arise is when someone applies Turkic casing
    operations to a Turkish text including i with circumflex (at least in
    NFD). The small version becomes <dotted I, circumflex>, which is wrong
    and looks wrong. The capital version becomes <dotless i, circumflex>,
    which is wrong but looks correct.

    So the rules should be adjusted so that the normal casing rule, not the
    special Turkic one, applies when there is a circumflex. But perhaps not
    when there are other accents e.g. I would expect that an acute is
    sometimes used as a stress marker but it would then need to appear in
    addition to the dot on i and dotted I, cf. Lithuanian.

    -- 
    Peter Kirk
    peter@qaya.org (personal)
    peterkirk@qaya.org (work)
    http://www.qaya.org/
    


    This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 15:47:19 EST