A Reversible Transcription System for Arabic

Bernard Greenberg - Basis Technology Corporation

Intended Audience: Software Engineers
Session Level: Intermediate

Basis Technology's line of products supporting Modern Standard Arabic (MSA) in Unicode 4.0 are based on a novel transcription-transliteration technology for this canonical dialect, solving many practical problems attendant to traditional transliteration schemes as well as to newer ones catering to data processing. Most notably, the scheme is fully expressible in ASCII (and comprises presentations for even smaller character sets) without compromising readability, phonetic fidelity, and complete reversibility. Unlike most other schemes, the Basis system exploits standard MSA orthography to select its most concise and intuitive representations, and defers to more exotic constructs only as text varies from that. The scheme deliberately leaves undefined and user-selectable certain orthographic details not relevant to meaning, making the transcription engine a useful tool for applying such choices to an entire text. While the scheme is oriented towards fully-vowelled Arabic, such as might be appropriate in databases, it includes a subset/fallback for the far more common nonvowelled case. The scheme exploits Unicode 4.0 features to surpass the capabilities of legacy systems.