Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

From: Markus Scherer <>
Date: Thu, 17 May 2012 13:39:08 -0700

On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham <> wrote:

> As x = 0F71, we also need the
> contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
> give the pair of long vowels. We don't need to worry about
> <x+0F73,0F73> because that is not FCD.

I am not following.

Given contractions

0F71+0F71 (needed as a prefix of the next one)

what other contractions do we need to add to avoid which problem?

At least this is not an infinite sequence of contractions, unlike my
> hypothetical example of a contraction for combining circumflex + g. For
> that, I think the solution is to decompose anything containing a
> trailing combining circumflex.

In principle, you are right. However, such a contraction is such a weird
case that I think we could just forbid it. That is, forbid a set of
contractions that would cause us to add infinite overlap contractions.

I know of only one contraction in DUCET+CLDR that contracts a non-starter
plus a starter (1037+1038 = Myanmar signs dot below [ccc=7] + visarga), and
this contraction does not overlap with any decomposition mapping.


Google Internationalization Engineering
Received on Thu May 17 2012 - 15:41:15 CDT

This archive was generated by hypermail 2.2.0 : Thu May 17 2012 - 15:41:15 CDT