L2/06-277 From: Deborah Anderson [mailto:dwanders@sonic.net] Sent: Thursday, August 03, 2006 4:12 PM To: 'kabir@iiu.edu.my' Subject: Regarding encoding Jawi characters in Unicode Dr. Abdul Kabir Hassain Solihu Dept. of General Studies Kulliyyah of Islamic Revealed Knowledge & Human Sciences International Islamic University Malaysia P.O. Box 10 50728 Kuala Lampur Malaysia Dear Dr. Abdul Kabir Hassain Solihu, I am writing in response to your letter to the Unicode Consortium dated 20 April 2006. I run a project at UC Berkeley that works with various user communities to ensure that their characters are included in Unicode, and I also work closely with the Unicode Technical Committee. I have reviewed your proposal with other members of the Unicode Technical Committee. I believe all the characters you are requesting are already in Unicode. For Jawi, it is quite fortunate that the characters are already encoded, as you will not have to wait several years to use them. Since Unicode encodes scripts, not languages, it is quite appropriate to use those in the Arabic block. In Appendix B you have identified the correct Unicode codepoints for 5 of the characters from the Arabic block: CHA = U+0686 ARABIC LETTER TCHEH PA = U+06A4 ARABIC LETTER VEH NGA = U+06A0 ARABIC LETTER AIN WITH THREE DOTS ABOVE VA = U+06CF ARABIC LETTER WAW WITH DOT ABOVE NYA = U+06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE [Note the comment in Chapter 8 of The Unicode Standard (available at http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf): "Jawi: U+06BD ARABIC LETTER NOON WITH THREE DOTS ABOVE is used for Jawi, which is Malay written using the Arabic script. Malay users know the character as Jawi Nya. Contrary to what is suggested by its Unicode character name, U+06BD displays with three dots below the letter when it is in the initial or medial position. This is done to avoid confusion with U+062B ARABIC LETTER THEH, which appears in words of Arabic origin, and which has the same base letter shapes in initial or medial position, but with three dots above in all positions."] The one character that is not associated with a character is in the Arabic Supplement block: GA = U+0762 ARABIC LETTER KEHEH WITH DOT ABOVE Appendix B in your proposal lists the contextual forms which are drawn from the Arabic Presentation Forms block. I would recommend that these presentation form characters -- those from the FEXX block -- not be used, but rather use the characters above from the Arabic/Arabic Supplement blocks (as listed above), and instead rely on implementations that can perform glyph shaping (by rendering rules), accessing the appropriate glyphs in fonts. (The Arabic presentation forms were included in Unicode for compatibility with pre-existing standards and legacy implementations.) If you have additional questions, please feel free to contact me. With best wishes, Deborah Anderson Deborah Anderson, Ph.D. Researcher, Dept. of Linguistics, UC Berkeley Proj. Leader, Script Encoding Initiative http://linguistics.berkeley.edu/sei NOTE NEW Email: dwanders@sonic.net (or dwanders@berkeley.edu)