L2/07-153

Subject:	Compatibility decomposition related descriptions at UAX #15
Date:		May 7'th, 2007
From:		Ienup Sung

1. Issue:

The revision 27 and also proposed update to UAX #15 (revision 28) has
the following canonical decomposition and compatibility decomposition
descriptions at the section 10:

	Canonical decomposition is the process of taking a string,
	recursively replacing composite characters using the Unicode canonical
	decomposition mappings (including the algorithmic Hangul canonical
	decomposition mappings; see Section 16, Hangul), and putting
	the result in canonical order.

	Compatibility decomposition is the process of taking a string,
	replacing composite characters using both the Unicode canonical
	decomposition mappings and the Unicode compatibility decomposition
	mappings, and putting the result in canonical order.

I think the compatibility decomposition definition at the above is a possible
source of confusion for not so careful readers since the above definition
differs from the D65 (or D20 of older versions):

    D65	Compatibility decomposition: The decomposition of a character that
	results from recursively applying both the compatibility mappings and
	the canonical mappings found in the Unicode Character Database, and
	those described in Section 3.12, Conjoining Jamo Behavior, until no
	characters can be further decomposed, and then reordering nonspacing
	marks according to Section 3.11, Canonical Ordering Behavior.

This possible confusion is further amplified by the following sentence at
the sub-section "Hangul Composition" at the section 16 since there is no
mentioning of NFKD:

	Notice an important feature of Hangul composition: whenever the source
	string is not in Normalization Form D, one cannot just detect
	character sequences of the form <L, V> and <L, V, T>. 

2. Proposal:

I'd like to propose to change the UAX #15 text for the compatibility
decomposition description at the section 10 into something like:

	Compatibility decomposition is the process of taking a string,
	recursively replacing composite characters using both the Unicode      |
	canonical decomposition mappings (including the algorithmic Hangul     |
	canonical decomposition mappings) and the Unicode compatibility        |
	decomposition mappings, and putting the result in canonical order.

and the sentence at the sub-section "Hangul Composition" at the section 16
to something like the following:

	Notice an important feature of Hangul composition: whenever the source
	string is not in Normalization Form D or Normalization Form KD,        |
	one cannot just detect character sequences of the form <L, V> and
	<L, V, T>. 

END_OF_MEMO.