RE: Mapping Hindu Numbers for ISO8859_6 and Windows Cp1256

From: Hart, Edwin F. (Edwin.Hart@jhuapl.edu)
Date: Fri Feb 19 1999 - 08:19:33 EST


You have raised a valid concern about ISO/IEC 8859-6, Latin/Arabic.
However, the ambiguity of the conversion is due to an inherent
characteristic of the 8859-6 standard. The standard allows the digits in
code positions 0x30 to 0x39 to be rendered (displayed/printed) with either
the European-style glyphs (0..9) or the glyphs used in the Arabic script.
Thus, both translations to Unicode are valid. During the revision of the
8859-6 standard, in the US committee, I had requested that ISO specify one
ISO/IEC 10646 (Unicode) code position (rather than two) for the digits but I
have not checked the results. I recall asking someone about this ambiguity
(I think that he was Mike Ksar) and was told that if the context were the
ASCII set, the digits would be rendered with the 0..9 glyphs, but it the
context were Arabic, then the digits would be rendered with the Arabic
shapes. Thus, if what my source said is correct, then in practice, you will
need two translations for the 8859-6 code positions 0x30 to 0x39 into
Unicode and you will also need to map both the Unicode digits at 0x0030 to
0x0039, and the digits with the Arabic script into 8859-6 code positions
0x30 to 0x39. This does not provide unique mapping nor unique visual
mapping, but it does provide for round-trip integrity.

Ed Hart

Edwin F. Hart
Applied Physics Laboratory
11100 Johns Hopkins Road
Laurel, MD 20723-6099
+1-240-228-6926 (from Washington, DC area)
+1-443-778-6926 (from Baltimore area)
+1-240-228-1093 (fax)
edwin.hart@jhuapl.edu <mailto:edwin.hart@jhuapl.edu>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:44 EDT