From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Tue Sep 22 2009 - 16:11:55 CDT
* Jeff Senn wrote:
>case 1: 1B11, 1B35
>case 2: 0CCA, 0CD5
>
>There are (non-compatibility) decompositions for both of these
>sequences:
>
> 1B12 --> 1B11, 1B35
> 0CCB --> 0CCA, 0CD5
>
>All of these characters have combining class 0. Can they be canonically
>combined? Even though the 2nd characters are NOT "combining"?
My reading is that most certainly they can be and that's what the latest
implementation I've written does (it is a very literal and data driven
implementation), see the implementation and test script at
http://lists.w3.org/Archives/Public/www-archive/2009Feb/0015.html
And a Perl script that turns the Unicode XML database into a SQLite one
http://lists.w3.org/Archives/Public/www-archive/2009Feb/0014.html
which is needed. The NormalizationTest.txt file has the cases
1B12;1B12;1B11 1B35;1B12;1B11 1B35;
0CCB;0CCB;0CC6 0CC2 0CD5;0CCB;0CC6 0CC2 0CD5;
0CCA;0CCA;0CC6 0CC2;0CCA;0CC6 0CC2;
Note that
# NFC
# c2 == NFC(c1) == NFC(c2) == NFC(c3)
# c4 == NFC(c4) == NFC(c5)
#
# NFD
# c3 == NFD(c1) == NFD(c2) == NFD(c3)
# c5 == NFD(c4) == NFD(c5)
Also note that you have the decomposition wrong, U+0CCB decomposes into
the sequence U+0CC6 U+0CC2 U+0CD5 as per the decomposition of 0CCA, per-
haps that is a source of confusion?
>So, if the answer is indeed "YES", one might add
>
>case 3: 0CCA, 0300, 0CD5 (admittedly unusual)
>
>which clearly should not compose since ccc(0300) >= ccc(0CD5)
>(http://www.unicode.org/review/pr-29.html)
I agree with this aswell.
-- Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
This archive was generated by hypermail 2.1.5 : Tue Sep 22 2009 - 16:15:26 CDT