Multilingual Collation in Two Middle Eastern Script Families
Elaine Renee Keown - Independent Researcher in Computational Semitics, Philadelphia
Statement of Purpose:
This paper explains the collating (alphabetic sorting) properties of two historically related Middle Eastern script families:
Both Hebrew and Arabic scripts were used to write many languages during the last 2900 years. Both scripts descend directly from the newly discovered 3rd millennium B.C. alphabetic inscriptions in Egypt's Western Desert. In this paper we focus on how Hebrew-Aramaic and Perso-Arabic scripts collate differently from scripts descended via Greek, especially when used in a multilingual situation.
Alphabets descended through Greek, such as Roman and Cyrillic, usually developed capital letters during the course of their script history. Therefore, for computer collation they always need a collation table to interweave the lower-case and upper-case letters. Such a collation table slows down database software by about 50%.
However, Perso-Arabic and Hebrew-Aramaic never developed capitals. For multilingual Hebrew-Aramaic, a collation table is not necessary. The two dozen alphabet variants for Hebrew-Aramaic can be interwoven in one chain that provides separate subcollations for each different language. However, for Perso-Arabic, which has a more complex script history and is used to write over 100 languages, a collation table is needed for the end of the alphabet. Perso-Arabic script developed in two directions, one based directly on Arabic and one on the Persian Arabic script. Languages such as Urdu, Panjabi, Pashto, Sindhi, and Siraiki collate multilingually with the Persian Arabic script.
Future versions of Unicode will include more variant letters for both Hebrew-Aramaic and Perso-Arabic script families. Languages using Perso-Arabic script are spoken by 400 million people in countries that are more computerized every month. The special multilingual collation found in these two script families should be built into future versions of Unicode-compatible algorithms.
|When the world wants to talk, it speaks Unicode|
International Unicode Conferences are organized by Global Meeting Services, Inc., (GMS).
GMS is pleased to be able to offer the International Unicode Conferences under an exclusive
license granted by the Unicode Consortium. All responsibility for conference finances and
operations is borne by GMS. The independent conference board serves solely at the pleasure
of GMS and is composed of volunteers active in Unicode and in international software
development. All inquiries regarding International Unicode Conferences should be addressed
Unicode and the Unicode logo are registered trademarks of Unicode, Inc. Used with permission.
12 December 2000, Webmaster