L2/03-026

Re: Scope of enclosing marks
From: Mark Davis
Date: 2002-01-29

As the editorial committee was going through the text to implement the latest UTC decisions, we came to a place where there is not enough direction to make an (editorial) decision, so we need the UTC to decide.

Unicode 3.2 has the following text (I added paragraph numbers):

1. Enclosing Combining Marks. These marks enclose the entire preceding grapheme cluster. For example, in the following sequence the entire Hangul syllable is circled, not just part of it:

2. This is also true of grapheme clusters composed of elements linked by a Grapheme_Link or combining grapheme joiner. For example, the entire conjunct is circled in the following sequence:

3. On the other hand, where elements are linked by a Grapheme_Link or combining grapheme joiner, non-enclosing combining marks only apply to the last base character. For example, in the following sequence the nukta applies to the immediately preceding ddha, not to the entire cluster:

But in the meantime, the UTC decided to narrow the scope of grapheme clusters to a clear core, basically:

(<hangul syllable> | <base> ) <non-spacing mark>*

[and the name is changed to "default grapheme cluster"]

That means that paragraph #2 and #3 above don't really work anymore. The UTC has to decide how to fix it. We broke it into two parts, because conceivably the answer might be different for a virama than it is for the grapheme joiner.

  1. Given the sequence "1" + grapheme_joiner + "2" + enclosing_circle, should the circle enclose the three previous characters or only the "2"?
  2. Given the sequence KA + VIRAMA + DDHA, should the circle enclose the three previous characters or only the DDHA?