Arabic encoding model (was Re: Arabic 16-bit encodings)

From: asadek@st-elias.com
Date: Sat Jul 02 2005 - 07:10:34 CDT

  • Next message: Gregg Reynolds: "Re: Arabic 16-bit encodings"

    N. Ganesan <naa.ganesan@gmail.com> wrote :
    >
    > Any 16-bit encodings for Arabic script other than Unicode?
    >

    I first thought one needed 16 bits (well more than 8 bits in any case) to represent all the Arabic characters.

    But I'm not so sure, it looks like this is only true because Unicode decided to encode many Arabic characters having diacritics as precomposed characters (e.g. U+06BD NOON WITH THREE DOTS ABOVE) while only recently (in Unicode 4.1) adding a few more combining marks (U+065A SMALL V) and a whole string of supplementary Arabic characters (U+0750 and following) which could nearly all have been encoded as a base Arabic letter and a combining mark (U+0756 BĀ' + SMALL V, U+0757 HA' + TWO DOTS ABOVE).

    Why have combining marks and precomposed Arabic characters using these combining marks (see the SMALL V for instance)? Why have encoded any of the new precomposed Arabic characters? I thought this was contradictory with Unicode's policy of encoding new precomposed characters (e.g. U+0756 introduced in Unicode 4.1) when it may be composed from an already encoded base character and a sequence of one or more combining marks? Why encode, version of Unicode after version of Unicode, new Arabic characters which could be coded as a base and a combining mark (why no THREE DOTS ABOVE/TWO DOTS ABOVE)? It seems that not having these diacritics is a disservice to the languages using the Arabic script: people using Arabic letters currently absent in Unicode will have to wait a few years to have them encoded after they have been discovered by the great Unicode and ISO experts.

    Is there any reason to this apparent mess?

     
    Ashraf Sadek

    --
    St Elias Coptic Community
    


    This archive was generated by hypermail 2.1.5 : Sat Jul 02 2005 - 10:09:09 CDT