Re: NFC/NFKC Normalization Edge Case

From: Bjoern Hoehrmann (derhoermi@gmx.net)
Date: Tue Sep 22 2009 - 16:11:55 CDT

Next message: Kenneth Whistler: "Re: NFC/NFKC Normalization Edge Case"

Previous message: Jeff Senn: "NFC/NFKC Normalization Edge Case"
In reply to: Jeff Senn: "NFC/NFKC Normalization Edge Case"
Next in thread: Kenneth Whistler: "Re: NFC/NFKC Normalization Edge Case"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

* Jeff Senn wrote:
>case 1: 1B11, 1B35
>case 2: 0CCA, 0CD5
>
>There are (non-compatibility) decompositions for both of these
>sequences:
>
> 1B12 --> 1B11, 1B35
> 0CCB --> 0CCA, 0CD5
>
>All of these characters have combining class 0. Can they be canonically
>combined? Even though the 2nd characters are NOT "combining"?

My reading is that most certainly they can be and that's what the latest
implementation I've written does (it is a very literal and data driven
implementation), see the implementation and test script at

http://lists.w3.org/Archives/Public/www-archive/2009Feb/0015.html

And a Perl script that turns the Unicode XML database into a SQLite one

http://lists.w3.org/Archives/Public/www-archive/2009Feb/0014.html

which is needed. The NormalizationTest.txt file has the cases

  1B12;1B12;1B11 1B35;1B12;1B11 1B35;
  0CCB;0CCB;0CC6 0CC2 0CD5;0CCB;0CC6 0CC2 0CD5;
  0CCA;0CCA;0CC6 0CC2;0CCA;0CC6 0CC2;

Note that

  # NFC
  # c2 == NFC(c1) == NFC(c2) == NFC(c3)
  # c4 == NFC(c4) == NFC(c5)
  #
  # NFD
  # c3 == NFD(c1) == NFD(c2) == NFD(c3)
  # c5 == NFD(c4) == NFD(c5)

Also note that you have the decomposition wrong, U+0CCB decomposes into
the sequence U+0CC6 U+0CC2 U+0CD5 as per the decomposition of 0CCA, per-
haps that is a source of confusion?

>So, if the answer is indeed "YES", one might add
>
>case 3: 0CCA, 0300, 0CD5 (admittedly unusual)
>
>which clearly should not compose since ccc(0300) >= ccc(0CD5)
>(http://www.unicode.org/review/pr-29.html)

I agree with this aswell.

-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Next message: Kenneth Whistler: "Re: NFC/NFKC Normalization Edge Case"
Previous message: Jeff Senn: "NFC/NFKC Normalization Edge Case"
In reply to: Jeff Senn: "NFC/NFKC Normalization Edge Case"
Next in thread: Kenneth Whistler: "Re: NFC/NFKC Normalization Edge Case"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Sep 22 2009 - 16:15:26 CDT