Re: Compliant Tailoring of Normalisation for the Unicode Collation Algorithm

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sun, 20 May 2012 02:29:05 +0100

On Sat, 19 May 2012 01:12:17 +0100
Richard Wordingham <richard.wordingham_at_ntlworld.com> wrote:

> This will then work for DUCET
> 6.1.0, work for Danish, and work for my mischievous 0302 COMBINING
> CIRCUMFLEX ACCENT+0067 LATIN SMALL LETTER G contraction.

There is a very similar rule in CLDR for Lithuanian - 0307+0301 has
CE(0301), and similarly for U+0300 and U+0303. This is to deal with
the problem that Lithuanian 'i' is typographically hard-dotted, and
therefore sprouts U+0307 when an accent is placed on it. Now,
Lithuanian has U+0117 LATIN SMALL LETTER E WITH DOT ABOVE, so the
contraction should cause problems for <0117,0301>, but that is
prevented by having an apparently do-little contraction for U+0117!

Richard.
Received on Mon May 21 2012 - 22:14:37 CDT

This archive was generated by hypermail 2.2.0 : Mon May 21 2012 - 22:14:37 CDT