Re: Arabic Normalization chart

From: Kenneth Whistler (kenw@sybase.com)
Date: Fri May 09 2008 - 16:45:54 CDT

  • Next message: Maha Hassan: "Re: Arabic Normalization chart"

    > I am trying to understand the normalization chart for Arabic.
    > Why there are certain glyphs are not decomposed entirely under KD, for
    example:
    > \FBF0 ==> has KD = \064A\0654\06C7 instead of =\064A\0654\0648\0619
    > \FBDB ==> KD= \06c8 instead of  =\0648\0670
    > am I missing something?

    Yes.

    U+06C7 and U+06C8 have no decompositions.

    06C7;ARABIC LETTER U;Lo;0;AL;;;;;N;ARABIC LETTER WAW WITH DAMMAH;;;;
                                ^^

    06C8;ARABIC LETTER YU;Lo;0;AL;;;;;N;ARABIC LETTER WAW WITH ALEF ABOVE;;;;
                                 ^^

    You cannot infer formal decompositions for letters --
    particularly for Arabic -- simply by looking at the
    characters in the chart. To get the normative decomposition
    status of any particular character (which determines
    what its NFD or NFKD or NFC or NFKC normalizations will be),
    you have to look at the decomposition field in
    UnicodeData.txt (or check in NormalizationTest.txt)

    --Ken



    This archive was generated by hypermail 2.1.5 : Fri May 09 2008 - 16:48:17 CDT