L2/03-403 Date: October 30, 2003 Title: Draft language for consensus 96-C20 re CGJ Author: Ken Whistler Action: For consideration by UTC ============================================================= At the last meeting of the UTC I was tasked as follows: [96-C20] Consensus: Add text to Unicode 4.0.1 which points out that combining grapheme joiner has the effect of preventing the canonical re-ordering of combining marks during normalization. [L2/03-235, L2/03-236, L2/03-234] [96-A72] Action Item for Ken Whistler: Draft language for consensus 96-C20 (on the effect of combining grapheme joiner to prevent canonical re-ordering of combining marks during normalization) for inclusion into Unicode 4.0.1 and create a FAQ describing this effect as well. [L2/03-235, L2/03-236, L2/03-234] So here goes. This text has had preliminary review by the editorial committee. If there is no objection to the text I am suggesting, then I will later spiff it up to fit into the FAQ, as well. =========================== draft text ======================= [ The following text is to be added to the explanation of the combining grapheme joiner on p. 392 of The Unicode Standard, Version 4.0.] Formally, the combining grapheme joiner is not a format control character, but rather a combining mark. It has the General Category value gc=Mn and the combining class value ccc=0. These property assignments result in the following behavior, which can be useful in certain circumstances. The presence of a combining grapheme joiner in the midst of a combining character sequence does not interrupt the combining character sequence; a process which is accumulating and processing all the characters of a combining character sequence would include a combining grapheme joiner as part of that sequence. (This differs from format control characters, whose presence would interrupt a combining character sequence.) However, because the combining class of the combining grapheme joiner is 0, canonical reordering will not reorder any adjacent combining marks around a combining grapheme joiner. (See the definition of canonical reordering in Section 3.11, Canonical Reordering Behavior.) In turn, this means that insertion of a combining grapheme joiner between two combining marks will prevent normalization from switching the position of those two combining marks, regardless of their own combining classes. This side-effect of the character properties of the combining grapheme joiner, together with the fact that the combining grapheme joiner has no visible glyph and no other format effect on neighboring characters, can be taken advantage of in those exceptional circumstances where two alternative orderings of a sequence of combining marks must be distinguished for some processing or rendering purpose and where normalization would otherwise level the distinction between the two sequences. For example, this is one way to avoid the less-than-optimal assignment of fixed-position combining classes to certain Hebrew accents and marks which do in fact interact typographically and for which accent order distinctions need to be maintained for analytic and text representational purposes. ====================== end draft ==========================