L2/11-266
Source: Mark Davis
Date: July 6, 2011
Subject: Handling of Cn/Cs/Co characters in UAX #29

Karl Williamson raised the issue of why the sequence <Cn + Extend> forms a grapheme cluster. While these are degenerate cases, the UTC should consider whether overall behavior would be better if we added the three odd-ball cases (
[:cn:][:cs:][:co:]) to [:gcb:control:].
It would also make the usage align more with the current definition of the Grapheme_Base property. That is, if we added [:cn:][:cs:][:co:] to [:gcb:control:], then the definition of Grapheme_Base is equivalent to all codepoints outside of [[:gcb:extend:][:gcb:lf:][:gcb:cr:][:gcb:control:]].