Re: Can the combining diacritical marks combine with any base character?

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Mon, 11 Feb 2013 08:49:43 +0000

On Sun, 10 Feb 2013 16:18:23 -0800
David Starner <prosfilaes_at_gmail.com> wrote:

> On Sun, Feb 10, 2013 at 3:46 PM, Costello, Roger L.
> <costello_at_mitre.org> wrote:
> > Hi Folks,
> >
> > Can the combining diacritical marks combine with any base character?
>
> Yes.
>
> > If yes, wouldn't normalizing this:
> >
> > <comment>(U+0303)
> >
> > to NFC result in converting the XML start tag into non-well-formed
> > XML? (It is not well-formed because there is no longer a '>'
> > character after the tag name; rather, there is a '>' character with
> > a tilde on top.)
>
> Normalizing it to NFC would change nothing, since there's no
> precomposed '>' + diacritic characters.

The problem sequence is <U+003E GREATER-THAN SIGN, U+0338 COMBINING LONG
SOLIDUS OVERLAY> which is canonically equivalent to <U+226F NOT
GREATER-THAN>. The short answer is that XML shall not do canonical
equivalence, at least, not on data; so doing would corrupt some of the
CLDR definitions, e.g. exemplar characters (TR 35 Section 5.6). The XML
specification addresses the solution for avoiding inadvertent ≯.

Richard.
Received on Mon Feb 11 2013 - 02:52:39 CST

This archive was generated by hypermail 2.2.0 : Mon Feb 11 2013 - 02:52:40 CST