Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

From: Markus Scherer <markus.icu_at_gmail.com>
Date: Thu, 17 May 2012 13:39:08 -0700

On Thu, May 17, 2012 at 1:02 PM, Richard Wordingham <
richard.wordingham_at_ntlworld.com> wrote:

> As x = 0F71, we also need the
> contractions of x+0F73 (or x+0F71+0F72) with 0F72, 0F74 and 0F80 to
> give the pair of long vowels. We don't need to worry about
> <x+0F73,0F73> because that is not FCD.
>

I am not following.

Given contractions

0F71+0F71 (needed as a prefix of the next one)
0F71+0F71+0F72
0F71+0F73

what other contractions do we need to add to avoid which problem?

At least this is not an infinite sequence of contractions, unlike my
> hypothetical example of a contraction for combining circumflex + g. For
> that, I think the solution is to decompose anything containing a
> trailing combining circumflex.
>

In principle, you are right. However, such a contraction is such a weird
case that I think we could just forbid it. That is, forbid a set of
contractions that would cause us to add infinite overlap contractions.

I know of only one contraction in DUCET+CLDR that contracts a non-starter
plus a starter (1037+1038 = Myanmar signs dot below [ccc=7] + visarga), and
this contraction does not overlap with any decomposition mapping.

markus

-- 
Google Internationalization Engineering
Received on Thu May 17 2012 - 15:41:15 CDT

This archive was generated by hypermail 2.2.0 : Thu May 17 2012 - 15:41:15 CDT