Proposal for a Unicode FAQ entry (was: combining enclosing marks for multi-letter sequences; grapheme clusters)

From: Karl Pentzlin (
Date: Thu Apr 13 2006 - 15:57:08 CST

  • Next message: Peter Constable: "RE: Unicode 5.0 decompositions of Balinese vowel signs with tedung"

    Am Donnerstag, 13. April 2006 um 22:27 schrieb Kenneth Whistler:
    KW> Karl Pentzlin inquired:
    >> I try to understand whether "E" + CGJ + "s" + CGJ + "c" + U+20E3
    >> COMBINING ENCLOSING KEYCAP should produce a representation of an
    >> "Esc" key in plain text (given an appropriate font rendering
    >> mechanism).
    KW> It should not. ...

    Therefore, I propose to add an entry to the Unicode FAQ (section
    "Characters, Combining marks"), something like the following
    (of course only if my understanding of the mechanism is correct now):

    Q: Is it possible to apply a diacritic or combining enclosing mark to
    a sequence of more than one (non-combining) character?

    A: No, with the exception of the "double diacritics" deliberately
    designed to be applied onto a two letter sequence, e.g. U+035D
    GRAPHEME JOINER) "glue" characters together in a way that the scope of
    any following combining character would be affected.
    To get a character sequence like "Esc" into something like the U+20E3
    COMBINING ENCLOSING KEYCAP, you must resort to higher-level protocols.

    KW> A CGJ by itself is simply a defective combining character sequence.
    KW> A CGJ does not *construct* grapheme clusters, if that is what you
    KW> are getting at.
    Maybe you can add in the FAQ entry:
    "Q: Does U+034F COMBINING GRAPHEME JOINER join graphemes?"
    a statement like "Especially, it cannot be used to *construct* grapheme
    clusters out of arbitrary character sequences, or extend the scope
    of subsequent combining characters.".
    - Karl Pentzlin

    This archive was generated by hypermail 2.1.5 : Thu Apr 13 2006 - 16:04:05 CST