Re: Arabic Normalization chart

From: Maha Hassan (maha.hassan96@yahoo.com)
Date: Sat May 10 2008 - 01:22:16 CDT

  • Next message: Philippe Verdy: "RE: Google posting about U5.1"

    Thank you for the explanation. correction: I meant U+064F (which existed in 1.0) and not U+0619 (which confuses me even more).   I have another question, why the introduction of U+0618..U+061A? I understand it is used in Koranic display but why the duplication of already existed marks and both have exactly the same effect on pronunciation. Koranic display can be resolved in the font level not in the encoding. Thanks, Maha   ----- Original Message ---- From: Kenneth Whistler <kenw@sybase.com> To: maha.hassan96@yahoo.com Cc: unicode@unicode.org; kenw@sybase.com Sent: Friday, May 9, 2008 5:43:08 PM Subject: Re: Arabic Normalization chart > Thanks for the references. > But, why U+06C7 has no decomposition? I can enter from Arabic > keyboard U+0648\U+0619 and get the exact glyph in U+06C7.  > How come u+0623 has a decomposition and not U+06C7? > What the criteria? It is an interaction of the requirements for normalization stability with the timing of the addition of various characters for the Arabic script. U+06C7 was already an encoded character in Unicode as of Version 1.1, dating back to 1993. The "composition version" for Unicode normalization stability is defined to be Version 3.1, dating back to 2001. See http://www.unicode.org/reports/tr15/#Versioning for details. Among other things that means that no character that was either decomposed or *not* decomposed as of Version 3.1, cannot ever have its decomposition status changed by a later version of the standard. Those few Arabic letters that *do* have decompositions, such as U+0622..U+0626, were *already* decomposed as of Version 3.1, based on U+0653..U+0655 (madda and/or hamza above or below), which were also already encoded as of Version 3.1. But combining marks added *after* Version 3.1 cannot be used in decompositions of Arabic characters encoded *before* Version 3.1 (or indeed those added in any version earlier than when the combining marks themselves were added). U+0619 ARABIC SMALL DAMMA was just added in Unicode Version 5.1, so it cannot be used to decompose any Arabic character from earlier versions. To do so would destabilize the normalization of Unicode data. See: http://www.unicode.org/policies/stability_policy.html#Normalization for the formal statement of this requirement for stability. Also, it should be noted that U+0619 (and similar characters in the range U+0610..U+0618) are really intended for honorifics and Koranic annotation -- they are not nuqtas used as diacritics to create new Arabic characters. So, for example, U+0615 ARABIC SMALL HIGH TAH is an annotation mark, as cannot be used to decompose U+0679 ARABIC LETTER TTEH (which looks like a dotless beh with a small high tah diacritic) or U+06BB ARABIC LETTER RNOON (which looks like a noon ghunna with a small high tah diacritic). So even though you could type such combinations and have them appear like those letters, they would not be canonical equivalents, nor would applications consider them to compare equal to each other. I realize that this is complicated and not at all self-evident from just using an Arabic keyboard and looking at the Unicode charts. But the constraints are in place because of the overriding requirement to keep Unicode normalization stable, not only for Arabic, but for all Unicode characters. --Ken ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ



    This archive was generated by hypermail 2.1.5 : Sat May 10 2008 - 01:25:38 CDT