Re: kurdish sorani

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Mon Aug 28 2006 - 19:18:17 CDT

  • Next message: Andries Brouwer: "Re: kurdish sorani"

    From: "John Hudson" <john@tiro.ca>
    To: "Andries Brouwer" <aebr@win.tue.nl>
    >> Indeed, ARABIC LETTER HEH (U+0647) is a letter with 4 glyph forms.
    >> In Kurdish (written in the Sorani, essential arabic, alphabet)
    >> one has two letters (let me call them Kurdish H and Kurdish E)
    >> and these 4 glyph forms become the two forms of Kurdish H
    >> and the two forms of Kurdish E.
    >> Initial and medial Heh are forms of Kurdish H, final and
    >> independent Heh are forms of Kurdish E.
    >> Kurdish E never joins to the following letter, so needs only two forms.
    >> Initial, final and independent Kurdish H are all written with initial Heh.
    >
    > The obvious solution would be to use U+06D5 for the Kurdish ə, which performs the same
    > function in the orthography of e.g. Uighur.

    Except that U+06D5 TEH MARBUTAH (right-joining) has two dots above its two nominal (final/isolated/left/) form and above its right (initial) form, and occurs only in trailing contexts, where it would normally always join with the preceding right character. This junction (and the extra dots) will be parasitic when the Kurdish E character occurs in initial (right) context or medial context.

    If I understand well the issue, conflicts are reported and can be solved as follows, about the usage of the four contextual forms of U+0647 (ARABIC LETTER HEH):
    * Kurdish H (left-joining):
        initial= initial Arabic heh
        medial= medial Arabic heh
        final= initial Arabic heh (*conflict solved by *prepending* ZWNJ before HEH ?)
        isolated= initial Arabic heh (*conflict solved by appending ZWJ after HEH)
    * Kurdish E (right-joining):
        initial= isolated Arabic heh (*conflict solved by appending ZWNJ ?)
        medial= final Arabic heh (*conflict solved by appending ZWNJ)
        final= final Arabic heh
        isolated= isolated Arabic heh

    But the problem with ZWJ and ZWNJ above is that they change the joining form of surrounding letters. So we would need to insert ZWJ or ZWNJ between the above and the surrounding letters, to give the correct form to these other surrounding letters. Hmmmm.... We would then find sequences <ZWNJ,ZWJ> or <ZWJ,ZWNJ> to control the letter form on both sides!!!

    So it seems that the compatibility characters in Arabic presentation forms-A block (that select specific forms of the HEH letter) could be better:

    * Kurdish H (left-joining, never joins with the previous character):
        initial= initial Arabic heh (can use U+0647, or may be U+FEEB=<initial>0647)
        medial= medial Arabic heh (can use U+0647, or may be U+FEEC=<medial>0647)
        final= initial Arabic heh (*conflict solved by using U+FEEB=<initial>0647)
        isolated= initial Arabic heh (*conflict solved by using U+FEEB=<initial>0647)
    * Kurdish E (right-joining, never joins with the next character):
        initial= isolated Arabic heh (*conflict solved by using U+FEE9=<isolated>0647)
        medial= final Arabic heh (*conflict solved by using U+FEEA=<final>0647)
        final= final Arabic heh (can use U+0647, or may be U+FEEA=<final>0647)
        isolated= isolated Arabic heh (can use U+0647, or may be U+FEE9=<isolated>0647)

    This gives a semantic in Kurdish, that is not known in Arabic where the compatibility characters are identified as identical smenatically to the "prefered" codes. This means that the "preference" in Unicode is only true for the Arabic language, and that for Kurdish, U+0647 would not be recommanded, and that letters in Arabic Presentation Forms-A would be prefered.

    The other solutions could be:
    * to use invisible format controls to override the joining type of only one character on the left or right of HEH, and not of the two surrounding characters (like with ZWNJ/ZWJ), with the interest that the U+0647 would still be used,
    * or to add some new invisible semantic combining specifiers after HEH to specifiy if it means H or E in Kurdish, and help determining the correct joining type and letterform.



    This archive was generated by hypermail 2.1.5 : Mon Aug 28 2006 - 19:35:52 CDT