Re: Handling atomically composite character sequences

From: Philippe Verdy (
Date: Sat Jun 11 2005 - 12:44:34 CDT

  • Next message: Philippe Verdy: "Re: UnicodeData.txt is invalid, flawed, broken, corrupt and wrong"

    From: "Michael Everson" <>
    > At 11:51 -0700 2005-06-10, Patrick Andries wrote:
    >>The fact is that there is still no good solution today (but I admit there
    >>is a viable one) for those who say I type a single N^ in Malgasy, to me
    >>this is a single letter and I don't want to have to press two backspaces
    >>everytime I delete this letter because Unicode forces this single letter
    >>to be encoded as a precomposed sequence .
    > Is this theory, or an actual complaint from actual users?

    Whatever Unicode will encode for this character, the whole combining
    sequence is a single default grapheme cluster. So it's up to the editing
    software to offer you a coherent editing mode that meets your need. Notably,
    if your keyboard driver allows you to compose that accented letter using
    dead keys in the prefered keyboard layout, this software will not se this
    character broken into distinct parts when it is composed. Instead all
    characters will be entered as complete grapheme clusters, and should be
    corrected at the same level.

    If your keyboard driver does not use dead keys, but allows you to enter
    diacritics separately, then the appropriate editing level for corrections
    will be at the character-per-character level. I doubt this will be the case,
    given almost all other Latin-based languages are composed using deadkeys,
    and composing letters at the grapheme cluster level.

    So blame your editor if it forces you to press Backspace twice to erase the
    complete grapheme cluster. (Yes Notepad acts like this on Windows: is it a
    feature or a defect?)

    This archive was generated by hypermail 2.1.5 : Sat Jun 11 2005 - 16:20:45 CDT