Re: Arabic Joining Classes

From: Gregg Reynolds (unicode@arabink.com)
Date: Fri Jun 03 2005 - 14:03:29 CDT

  • Next message: Rick McGowan: "CLDR Version 1.3 Released"

    Andreas Prilop wrote:
    > In http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf
    > I read on p. 15 (= p. 204)
    >
    > | In some cases, characters occur only at the end of words
    > | in correct spelling; they are called trailing characters.
    > | Examples include teh marbuta, alef maksura, and dammatan.
    > | When trailing characters are joining (such as teh marbuta),
    > | they are classified as right-joining, even when similarly
    > | shaped characters are dual-joining.
    >
    > In http://www.unicode.org/Public/UNIDATA/ArabicShaping.txt
    > however, the trailing characters U+0649
    > http://ppewww.ph.gla.ac.uk/~flavell/unicode/unidata06.html#x0649
    > and U+06BA
    > http://ppewww.ph.gla.ac.uk/~flavell/unicode/unidata06.html#x06BA
    > are classified as dual-joining.
    > Why?
    >
    FYI, to add to what Rick sent:

    ALEF MAKSURA is incorrectly named. Or you could also say the glyph is
    incorrect. The term "alef maksura", in Arabic, denotes a grammatical
    category, not a character. It means the (implicit) preceding vowel "a"
    should not be lengthened. It occurs at the end of words, and it takes
    two forms: U+0649 (dotless yeh), and U+0627 (alef). Both are called
    (denote) alef maksura in the right circumstances. If Unicode wants to
    call U+0649 ALEF MAKSURA then it should also allow for the alef
    letterform. But it would be better to call it DOTLESS YEH. The
    _letterform_ DOTLESS YEH occurs in all four forms in written Arabic.
    Note that students learning Arabic are usually taught that final dotless
    yeh _is_ alef maksura (it's simpler that way) and don't learn what it
    means nor that final alef is also sometimes called maksura. This is
    likely true for the average student in the Arab world too, I would guess.

    Regarding orthography, it is very common to see dotless yeh in final
    position used interchangably with the character (dotted) YEH (U+064A),
    especially in printed material from Egypt. This is frequent in the
    Quran as well; for example, the common particle "fy" (FEH YEH)
    (pronounced "fee") may be written with dotless yeh instead of dotted
    YEH; in no way could this be considered and ALEF MAKSURA. I suspect
    this reflects typographic aesthetics; it lightens the page a bit. I
    also see this occasionally in office correspondence. Which you might
    take as evidence that U+0649 is most "naturally" construed as a YEH form
    by native speakers.

    Thomas Milo pointed out that dotless yeh occurs in the middle of words
    in the Quran. I don't know about the "middle", but it does commonly
    carry a "stacked" element at the end of words. For example, it is often
    surmounted by U+0670 (small "dagger" alef above) and or U+0653 (the
    MADDA mark). You would think of it as the final character in the word,
    but of course the Unicode representation would place it before the
    "stacker" elements.

    Since the name cannot be changed, I would suggest the removal of
    language referring to ALEF MAKSURA as a trailing character, and adding a
    note about its relation to YEH phonological semantics. It might also be
    a good idea to add a note to the definition of YEH indicating that the
    dots are a stylistic matter and are optional in certain circumstances.

    -gregg



    This archive was generated by hypermail 2.1.5 : Fri Jun 03 2005 - 14:03:38 CDT