RE: Updates to my IRANSYSTEM to Unicode table

From: Marco.Cimarosti@icl.com
Date: Fri Jan 21 2000 - 09:04:24 EST


In http://sina.sharif.ac.ir/~roozbeh/farsiweb/iransystem.txt Roozbeh used
ZW(N)J as a very elegant way to specify the shapes of Arabic letters, E.g.:

        0xE1 <ZWNJ>+0x0639+<ZWNJ> # ARABIC LETTER AIN, isolated form
        0xE2 <ZWJ>+0x0639+<ZWNJ> # ARABIC LETTER AIN, final form
        0xE3 <ZWJ>+0x0639+<ZWJ> # ARABIC LETTER AIN, medial form
        0xE4 <ZWNJ>+0x0639+<ZWJ> # ARABIC LETTER AIN, initial form

This notation also allowed him to easily represent non-Unicode categories
like <final-isolated> or <initial-medial>. E.g.:

        0xA7 0x0633+<ZWNJ> # ARABIC LETTER SEEN, final-isolated form
        0xA8 0x0633+<ZWJ> # ARABIC LETTER SEEN, initial-medial form

Why not using a similar notation also in the Unicode database
(ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt) in place of the
special compatibility classes <isolated>, <final>, <initial> and <medial>?

FEC9;ARABIC LETTER AIN ISOLATED FORM;...;<isolated> 0639;...
FECA;ARABIC LETTER AIN FINAL FORM;Lo;...;<final> 0639;...
FECB;ARABIC LETTER AIN INITIAL FORM;...;<initial> 0639;...
FECC;ARABIC LETTER AIN MEDIAL FORM;...;<medial> 0639;...

Would perhaps be easier to process automatically if they were expressed like
this:

FEC9;ARABIC LETTER AIN ISOLATED FORM;...;<compat> 200C 0639 200C;...
FECA;ARABIC LETTER AIN FINAL FORM;Lo;...;<compat> 200D 0639 200C;...
FECB;ARABIC LETTER AIN INITIAL FORM;...;<compat> 200C 0639 200D;...
FECC;ARABIC LETTER AIN MEDIAL FORM;...;<compat> 200D 0639 200D;...

But I know that *canonical* mappings for existing characters are now locked
for backward compatibility. I wonder whether *compatibility* mappings are
also locked for the same reason?

_ Marco



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:58 EDT