Re: Arabic encoding model

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Sun Jul 03 2005 - 12:58:41 CDT

  • Next message: Peter Kirk: "Re: Greek curled beta in Unicode code chart"

    Asadek St. Elias asked:

    > Why have combining marks and precomposed Arabic characters using these
    > combining marks (see the SMALL V for instance)? Why have encoded any of
    > the new precomposed Arabic characters? I thought this was contradictory
    > with Unicode's policy of encoding new precomposed characters (e.g. U+0756
    > introduced in Unicode 4.1) when it may be composed from an already encoded
    > base character and a sequence of one or more combining marks? Why encode,
    > version of Unicode after version of Unicode, new Arabic characters which
    > could be coded as a base and a combining mark (why no THREE DOTS ABOVE/TWO
    > DOTS ABOVE)?

    > Is there any reason to this apparent mess?

    How about initial ignorance of just how many of these combinations there
    were?

    The Latin script is partly helped by the tradition that vowels might be
    combined with any accent, though in many fonts the combinations are pretty
    poor. The concept that the a base Arabic form could be combined with any
    combination of distinguishing dots (or other marks) wasn't formed, and would
    now be stymied by the 'stability pact' that requires that anything that is
    now in Normal Form Composed or Normal Form Decomposed remain so for ever.
    Also, should one allow 'dotless noon + combining two dots above' when its
    initial and medial forms would clash with 'teh' (U+062A)? I say dotless
    noon, but perhaps one would make normal 'noon' (U+0646) 'soft dotted'. That
    concept, for Latin 'i' and 'j', is itself quite recent.

    So far as I am aware, there is no way of composing such new consonants.

    Richard.



    This archive was generated by hypermail 2.1.5 : Sun Jul 03 2005 - 13:04:17 CDT