Re: Can the combining diacritical marks combine with any base character?

From: Eric Muller <>
Date: Tue, 12 Feb 2013 14:19:57 -0800

On 2/11/2013 12:49 AM, Richard Wordingham wrote:
> The problem sequence is <U+003E GREATER-THAN SIGN, U+0338 COMBINING LONG
> SOLIDUS OVERLAY> which is canonically equivalent to <U+226F NOT

Which demonstrates: NFC applied to the serialization of an XML infoset
is not the same as NFC applied to the text nodes and attributes of that

> The short answer is that XML shall not do canonical
> equivalence, at least, not on data; so doing would corrupt some of the
> CLDR definitions,

That case is different: it's whether a use of text strings (CLDR in this
case) can be indifferent to normalization. There are other cases, e.g.
the regular expressions to validate some of Unihan's properties, which
should not be normalized, and which assume that the data to be validated
is in NFD.

Received on Tue Feb 12 2013 - 16:21:38 CST

