Grapheme clusters

From: Chris Harvey (
Date: Mon Oct 04 2004 - 21:02:02 CST

  • Next message: James Kass: "Re: Grapheme clusters"


    I'm working on a language which uses grapheme clusters (e.g. <ng> or <kw>).

    The speakers want it so these clusters are treated as single "letters" in
    character counts as well as for backspacing. So a word like "kwang" could
    be deleted with three backspace keystrokes.

    I found the following information in UAX#29:

    “As far as a user is concerned, the underlying representation of text is
    not important, but it is important that an editing interface present a
    uniform implementation of what the user thinks of as characters. Grapheme
    clusters commonly commonly behave as units in terms of mouse selection,
    arrow key movement, backspacing, and so on. When this is done, for
    example, and an accented character is represented by a combining character
    sequence, then using the right arrow key would skip from the start of the
    base character to the end of the last combining character.”

    My question is, how do I correctly take advantage of this? Would I do
    something like make the cluster <ng>: <n> + <combining grapheme joiner>
    + <g>?

    The users seem determined to put the entire alphabet into the PUA, thus
    making a single character for <ng>, <kw>, <ii> etc. I would like to be
    able to present them with something that works and avoid this kind of

    Thank you for your help

    Chris Harvey

    Gwlad heb iaith, gwlad heb galon

    This archive was generated by hypermail 2.1.5 : Mon Oct 04 2004 - 21:05:36 CST