RE: Question about a Normalization test

From: Whistler, Ken <ken.whistler_at_sap.com>
Date: Thu, 23 Oct 2014 18:15:00 +0000

Aaron Cannon asked:



> Hi all, from the latest version of the standard, on line 16977 of the

> normalization tests, I am a bit confused by the NFC form. It appears

> incorrect to me. Here's the line, sans comment:

>

> 0061 0305 0315 0300 05AE 0062;0061 05AE 0305 0300 0315 0062;0061 05AE

> 0305 0300 0315 0062;0061 05AE 0305 0300 0315 0062;0061 05AE 0305 0300

> 0315 0062;

>

> Just looking at column 2, which according to the comments at the top

> is the NFC form:

>

> 0061 05AE 0305 0300 0315 0062:

>

> This, however, does not appear to be in NFC form.

>

> The first character, and the second or third characters do not

> compose. However, the first and fourth (0061 and 0300) do, composing

> to 00E0.

>

> Since there are no further compositions, the normalized form should be

> 00E0 05AE 0305 0315 0062

>

> What am I missing?

>



Input is:



Code points: 0061 0305 0315 0300 05AE 0062

Ccc: 0 230 232 230 228 0



Output of canonical reordering is:



Code points: 0061 05AE 0305 0300 0315 0062

Ccc: 0 228 230 230 232 0



Next step is to start from 0061 and test each successive combining

mark, looking for composition candidates.



0061 does not compose with 05AE.

0061 does not compose with 0305.

0061 *could* compose with 0300 (00E0 = 0061 + 0300), *but*

0300 is *blocked* from 0061 by the intervening combining

mark 0305 with the *same* ccc value as 0300. So the

composition does not occur.

0061 does not compose with 0315.

The next character is 0062, ccc=0, a starter, so we are done.



For the relevant definitions, see:



http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf#G50628



and scroll down a couple pages to D115 on p. 139.



Test cases like this are included in NormalizationTest.txt precisely

to ensure that implementations are correctly detecting these

sequences where composition is blocked.



--Ken


_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Thu Oct 23 2014 - 13:16:31 CDT

This archive was generated by hypermail 2.2.0 : Thu Oct 23 2014 - 13:16:31 CDT