Re: Normalization test

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Mon, 10 Mar 2014 20:28:57 +0100

toNFC(0061 0305 0315 0300 05AE 0062) ->

From DerivedCombiningClass.txt<http://www.unicode.org/Public/UCD/latest/ucd/extracted/DerivedCombiningClass.txt>:

  05D0..05EA ; 0 # Lo [27] HEBREW LETTER ALEF..HEBREW LETTER TAV

In other words, 05EA with combining class 0 is blocking the
composition and any reordering between

  (0061 0305 0315 0300) on one side, and

  (0062) on the other side (which is also combining class 0).

So you will effectively get the composition of 0061 and 0305 (because
it is also no specifically excluded from composition in
CompositionExclusions.txt
<http://www.unicode.org/Public/UCD/latest/ucd/CompositionExclusions.txt>)
in:

  toNFC(0061 0305 0315 0300 05AE 0062),

but NOT in:

  toNFC(0061 05AE 0305 0315 0300 0062).

I think you have mixed the two separate test cases.

The first thing to check is to break sequences before every character with
combining class 0 (even if it is "combining", like here the Hebrew accent
zinor).

2014-03-10 19:34 GMT+01:00 Markus Doppelbauer <doppelbauer_at_gmx.net>:

> Hello,
>
> I am working on an Unicode Normalization implemenation. I have a question
> about a specific toNFC test rule.
>
> toNFC(0061 0305 0315 0300 05AE 0062) =>
> (0061 05AE 0305 0300 0315 0062)
> expected:
> (0061 05AE 0305 0300 0315 0062)
> \-------------/ =>
> (00E0 05AE 0305 0315 0062)
>
> Why doesn't 0061 and 0300 combine to 00E0 ?
>
> Thanks a lot
> Markus
>
>
> _______________________________________________
> Unicode mailing list
> Unicode_at_unicode.org
> http://unicode.org/mailman/listinfo/unicode
>
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Mon Mar 10 2014 - 14:28:57 CDT

This archive was generated by hypermail 2.2.0 : Mon Mar 10 2014 - 14:30:19 CDT