UTC/2000-005

Submitted by Mark Davis
January 13, 2000

CodePoint 2060, 2061
Name ZERO WIDTH GRAPHEME BREAK;
ZERO WIDTH GRAPHEME JOIN
GeneralCategory Cf ⿿ Other, Format
BlockName 2000; 206F; General Punctuation
BlockName  
MarkupUse  
InformativeAnnotations ZWGB between two character indicates that the sequence of characters should not be treated as a single grapheme in circumstances where they would otherwise would be.

ZWGJ between two characters indicates that the sequence of characters should be treated as a single grapheme in circumstances where they otherwise would not be. It can also be used within longer sequences, such as s[ZWGJ]h[ZWGJ]c[ZWGJ]h.

Both of these characters may affect semantics, e.g. collation behavior, spelling checking, word-match, etc.

InformativeAnnotations  
CollationBehavior In the default collation ordering, these are completely ignorable. In tailored collation ordering, these can be used to distinguish sequences that form graphemes from those that don't. For example, in a Slovak collation, "ch" sorts as a single collation element after "c". Spelling a word with "c[ZWGB]h" can be used to disable that. (Such spelling will have no effect if there is no "ch" sequence in the collation ordering.)

Any tailored collation ordering that contains contracting elements should add ZWGJ within the sequences. E.g. Slovak should have both the following rules:

c < ch;
ch = c[ZWGJ]h;

DecompositionClass no decomposition
CharacterDecomposition
CanonicalOrdering 0 - Spacing, split, enclosing, reordrant, and Tibetan subjoined
CanonicalCombiningClass 0
DecompositionClass ON ⿿ Other Neutrals
NumericType none ⿿ no numeric value
CanonicalCombiningClass1
CanonicalCombiningClass2
CanonicalCombiningClass3
CanonicalCombiningClass4
LineBreak IN ⿿ Inseparable
EastAsianWidth Neutral ⿿ does not occur in EA sets
CursiveShaping T ⿿ transparent to linking (non-spacing marks)
ComplexShaping Whether graphemes are joined or not should not otherwise affect cursive or ligature behavior in normal circumstances. Any exceptions to this should be specifically listed in the script descriptions.