Re: Berber and Maghribi letters

From: Roozbeh Pournader (
Date: Thu Apr 30 2009 - 16:04:54 CDT

  • Next message: "Re: Entropic Evidence for Linguistic Structure in the Indus Script"

    On Thu, 2009-04-30 at 15:21 +0200, Titus Nemeth wrote:
    > I have a manuscript in Berber language of the poet Muhammad al-Awzali in
    > Arabic script and want to type it. It contains a few of the "Berber"
    > characters (Kah with three dots below etc.), among them a miniature Ayn.
    > I was not able to find it encoded in the Unicode charts and also the
    > list-archives did not show results to my queries. One of the words that
    > use the letter is for example:
    > "miniature Ayn" (Fatha) + Alif + Yeh (Sukun) + Lam (Shadda/Fatha) + Nun
    > (Sukun)

    Would it be possible for you to upload a scan or photo of the word
    somewhere and send the list a link to the image?

    > I am not familiar with Berber languages which makes it more difficult to
    > find out about this. I saw the use of a Greek Epsilon on a "TAMAZIGHT"
    > website, but doubt that this is conventional.

    It is definitely not conventional, especially if you want to put a Fatha
    over the letter. I would not recommend using the Greek Epsilon for this.

    > The only potential Unicode I found is U+01B9
    > • archaic phonetic for voiced pharyngeal
    > fricative
    > • sometimes typographically rendered with a
    > turned digit 3
    > • recommended spelling 0295 Z
    > → 0295 Z latin letter pharyngeal voiced
    > fricative
    > → 0639  arabic letter ain
    > Yet, I do not understand the relation to Ayn and whether this code would
    > actually be used in my context.

    There are two relation to Arabic Ain:

    * This Latin letter has been historically used to transcribe the sound
    of Ain (note: some/most linguists say that Ain in Arabic is not
    pharyngeal but epiglottal instead).

    * It looks similar to Ain, and its original shape may be based on Ain.

    But U+01B9 should not be used for your purpose either. This is clearly a
    Latin letter.

    > Moreover, I wonder about the encoding of Feh with dot below (06A2) and
    > the Qaf with a single dot above (06A7). As far as I have understood
    > (correct me if I'm wrong), those two letters are only graphically
    > distinct from the regular Feh (0641) and Qaf (0642).

    Unicode tends to encode the Arabic script more graphically than some
    would expect.

    Another commonly-cited case is the case of U+0643 ARABIC LETTER KAF vs U
    +06A9 ARABIC LETTER KEHEH. In some languages, the glyph shapes used in
    Unicode charts are both considered OK, while there is usually a
    preference for one of the forms over the other.

    There are various reasons some of these pairs have been encoded
    separately. For example, some languages may use both forms with a
    phonemic or semantic difference. For example, while U+06CC ARABIC LETTER
    FARSI YEH and U+06D2 are considered graphical variants in Persian, their
    distinction is important in various South Asian languages written in the
    Arabic script.

    Generally, I would recommend encoding the text graphically if your
    readership would be specialists: If the source material puts a dot under
    the Feh, use U+06A2. That way, you would keep the distinction in the
    source material. You can also provide a standardized/simplified version
    to ease searching with software tools that don't know there is a
    relation between U+06A2 and U+0641, or for cases when fonts to render
    the text are hard to find.

    Still, if the text is to be read by the general public only, you may
    want to only use the standardized orthography of the common language. I
    typing a classic Persian poem from a manuscript for my weblog. I would
    use U+06AF ARABIC LETTER GAF. I would only use U+063C in documents where
    I wish to discuss the specific classical orthography that has used three
    dots under the letter Skeleton.

    > I also wondered wether the Unicode values for these letters are actually
    > used by anyone?

    Oh, definitely. To cite an commonly available resource, you can usually
    find Wikipedia articles using such characters easily.

    But generally, fonts and keyboards are usually the barrier for adoption
    of Unicode characters. Until there is an easy way to enter and display a
    certain character, users tend to avoid it.


    This archive was generated by hypermail 2.1.5 : Thu Apr 30 2009 - 16:09:20 CDT