Arab Maks

From: Arno Schmitt (
Date: Wed Jul 14 2010 - 01:46:27 CDT

  • Next message: Erkki I Kolehmainen: "RE: Latin Script"

    The marks in the Arabic bloc are not well organized;
    they belong to eleven mark classes, eight for marks above the base character, three for marks below.
    In Unicode logic marks within a class with a lower number should be closer to the base character than those within a class with a higher number. So kasratan (mark class 29) should be closer to the base character than kasra and small kasra (mark class 32), and all of these should be closer to the base character than all other marks below (mark class 220).
    As anybody familiar with the Arabic script knows, this is not the case.

    Should we try to remedy this?
    Is there any software that uses the mark classes directly?

    Let's look at the position of all the marks irrespective for their current Unicode mark class.

    Closest to the skeleton letter come the diacritical dots which modify the value of the letter. In Unicode -- as in modern Arabic -- these dots are considered part of the base character (therefore hundreds of characters have to be encoded instead of about twenty letters plus about twenty diacritics).

    Next come the signs that modify the pronunciation of the letter.
    Small high seen (06DC) and small low seen (06E3) belong clearly to this category: they tell the reader NOT to pronounce the sad that is written, but to pronounce a seen instead.
    Hamza above (0654) and hamza below (0655) belong to that category: they tell the reader that the vowel letter is not to be pronounced as a vowel but is a carrier of hamza -- here one has to keep in mind that the letter hamza (0621) did at first not belong to the Arabic script; originally hamza was not written or (more often) was written by a vowel [or maybe in the pronunciation written there was no hamza where one is spoken today].
    Small high meem (06E2) and small low meem (06ED) [Unicode names are unsystematic: "hamza below" but "low seen," "hamza above" but "high meem"]. In most editions of the Koran in the Arab world the vowelless noon [nŻn s‚kin] is written only with the high meem, so it is the nearest sign to the noon by default; in Turkey and Iran this noon normally has a sukŻn or jazm, but no small meem; in the East the rule pronunication-aide-closer-than-vowel-sign (see below) is applied.
    Small high rounded zero (06DE), small high upright rectangular zero (06DF) and the wasla sign (at present only as part of U+0671) are pronunciation modifiers too: they tell the reader that the letter is (sometimes) silent, and they sit very close to the letter.

    Next comes the shadda,
    then the vowel signs. I don't see any good reason why fatha (0618 + 863E) and damma (0619 + 064F), sukun (0652) and head of kha (06E1) are put into to different mark classes. I think even fathatan (054B) and dammatan (O64C) should be in the same class, but that is less clear cut; without any doubt they come after the shadda and before the koranic pause signs, but -- just as Hebrew waw and shin attract preceding and following holam -- one can argue that a following upright alif attracts fathatan and dammatan i.e. up and/or to the left, but this is not always the case, and even when it happens, it is not a sufficient reason for not putting them into the same marks class as the other vowel signs above.
    While putting some vowel signs into different mark class, is unelegant and inefficient, but does no great harm, some vowel signs are put into mark class 230, the generic far above class: head of kha (06E1) and inverted damma (0657) are no different from sukun and damma -- and superscript alif (0670) which is according to Unicode "actually a vowel sign" has its proper place together with the rest of the vowel signs.

    Likewise below: after the -- uncounted -- diacritic dots come the vowel signs: kasra, kasratan and subscript alif (the first two in marks class 32 and 29, the last in the generic below class 220).

    Next come -- in the Ottoman tradition only -- the pronunciation aides below the letter. In most cases they occurs directly below a letter that has a vowel sign above, but in the few cases with a kasra they sit below the vowel sign. I would class the Ottoman pause sign skth -- which sits in South Asia on the base line between two words and is a normal (i.e. high sitting) pause sign in Q24 -- together with the Ottoman pronunciation aides, as it is half a pause sign and half a pronunciation aide: it asks for the shortest of pauses, just enough to prevent assimilation.

    Then -- highest above the letter -- come the pause signs.

    I ignored the madda sign, because it is not the same sign in Standard Arabic and in Koranic Arabic.

    This archive was generated by hypermail 2.1.5 : Wed Jul 14 2010 - 02:46:14 CDT