Re: combining enclosing marks for multi-letter sequences; grapheme clusters

From: Richard Wordingham (
Date: Thu Apr 13 2006 - 15:55:33 CST

  • Next message: Karl Pentzlin: "Proposal for a Unicode FAQ entry (was: combining enclosing marks for multi-letter sequences; grapheme clusters)"

    Karl Pentzlin wrote on Thursday, April 13, 2006 at 9:48 AM:

    >I try to understand whether "E" + CGJ + "s" + CGJ + "c" + U+20E3
    > COMBINING ENCLOSING KEYCAP should produce a representation of an
    > "Esc" key in plain text (given an appropriate font rendering
    > mechanism).

    > I refer to the phrase "The combining enclosing marks apply to a
    > preceding default grapheme cluster." (printed edition of "The Unicode
    > <standard V4.0", p.188).

    TUS 4.0 Section 15.2 p392 says:

    "For rendering, the combining grapheme joiner is invisible. However, some
    older implementationsmay treat a sequence of grapheme clusters linked by
    combining grapheme joiners as a single unit for the application of enclosing
    combining marks."

    In other words, it may once have worked, but it shoudn't work now.

    > Does the Combining Grapheme Joiner (CGJ, U+034F) constitute a grapheme
    > cluster in the sense of UAX 29 "Text boundaries"?
    > I did not find any evidence there. (Maybe I overlooked something or
    > searched in the wrong place?)

    Yes - see for example

    However, that does not help, for a CGJ is the final character in a *default*
    grapheme cluster unless it is followed by another character with the
    'extend' property.

    > Or is producing multi-letter key representations in plain text done
    > by another mechanism as CGJ (e.g. ZWJ), or is it subject to higher level
    > protocols at all?

    It would seem it can't be done for plain text.


    This archive was generated by hypermail 2.1.5 : Thu Apr 13 2006 - 16:04:01 CST