Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Thu, 17 May 2012 21:02:46 +0100

On Wed, 16 May 2012 16:03:08 -0700
Markus Scherer <markus.icu_at_gmail.com> wrote:

> The problem is a contraction x+0F72 and input text x+0F73 where the
> inner 0F71 should be skipped. We can avoid this by adding a
> contraction for x+0F73 (and one for the equivalent x+0F71+0F72).
>
> On the other hand, x+0F73 (together with x+0F71+0F72) is harmless, it
> does not match the second half of anything else. Separately, we
> should have the prefix contraction x+0F71 so that discontiguous
> contractions match as expected, but we don't need x+0F72.

This isn't the end of the story. As x = 0F71, we also need the
contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
give the pair of long vowels. We don't need to worry about
<x+0F73,0F73> because that is not FCD.

At least this is not an infinite sequence of contractions, unlike my
hypothetical example of a contraction for combining circumflex + g. For
that, I think the solution is to decompose anything containing a
trailing combining circumflex.

Richard.
Received on Thu May 17 2012 - 15:04:57 CDT

This archive was generated by hypermail 2.2.0 : Thu May 17 2012 - 15:04:58 CDT