Re: Arabic Script: A new Hamza is required for Urdu and Sindhi

From: Michael Fayez (
Date: Sat Sep 17 2005 - 05:47:41 CDT

  • Next message: Anto'nio Martins-Tuva'lkin: ""Unicode encoded" button"

    From:  Gregg Reynolds <>
    To:  Michael Fayez <>
    Subject:  Re: Arabic Script: A new Hamza is required for Urdu and Sindhi
    Date:  Fri, 16 Sep 2005 17:42:14 -0500
    >Michael Fayez wrote:
    >>All feminine singular words ending with teh marboota when appending
    >>the possessive pronoun to them the teh marbuta is converted into
    >>normal teh
    >>Like سيارة (sayarah - a car)   سيارتي (sayaraty – my
    >>car) not سيارةي we also have to train the user to change the
    >>teh marboota into teh. So according to your suggestion it would be
    >>good to have a letter ‍ ة ‍ة ـتـ (no initial form) or even
    >>modify the properties of the existing letter. But this will only
    >>add more visual ambiguity to the Unicode. When the medial teh
    >>letter is seen in a word would it be a teh marboota in its medial
    >>form or a normal teh in its medial form??? I will never know.
    >On the contrary, it would make Unicode accurately reflect the actual
    >relationship between character identity and glpyhic form in Arabic.  
    >If you look at written Arabic text, there is nothing to tell you
    >that the t in sayaraty is actually a non-lexical ta ta'neeth.  Only
    >educated literates know this, because they bring their knowledge to
    >the page. It's no different when looking at a computer monitor.
    >It goes back to character identity v. form.  The t in sayaraty and
    >the "teh marboota" in sayarah are the *same character*.  These words
    >are under the same dictionary entry.  To look up sayaraty, you don't
    >look under syrt; you look under syr, where teh marboota is a
    >subentry. That's the key.  That is how we can say that e.g. medial
    >and isolated yeh - two totally unrelated graphical forms - are two
    >forms of the *same character*.
    >So the legacy model of teh marboota unfortunately leads developers
    >into a mistaken notion of the language.  What's worse is it's effect
    >on search/sort.  It distorts sorting.  E.g.  should sort with  and
    >before  but if the t in durraty is "normal" t, it will sort after
    >the latter.  For searching, a search on teh marboota should match
    >even when there is a suffix.

    Ok. But we will need to change the properties of the encoding of Arabic. At least there should be some evidence to change it not just a grammatical rule. There should be an authentic Arabic grammar book that says that

    1-       teh marbuta has a medial form

    2-       teh marbuta in its medial form has the same shape of teh in medial form

    And the same should be for Sindhi.

    >>Plus it will expose the already established Arabic encoding to many
    >>problems as not only Arabic and Sindhi will have their special
    >>artificial letters (artificial because they are not part of the
    >>alphabet taught to children in school or in books)  but also all
    >>the other languages using the Arabic script will have their special
    >I'm not sure what you mean here.  Are you saying teh marbuta is not
    >taught to children?

    This is not what I meant. What I meant is: The children in school are taught that teh marbuta is always at the end of the word. They are never taught any case when the teh marbuta is in middle or beginning of a word, when teh marbuta is followed by a possessive pronoun it is converted into teh and not differentiated from the normal teh letter. It has nothing to do with Unicode or any other character encoding as teaching Arabic (at least when I was a child) does not involve computers. Accepting the proposal of Mr. Lateef Sagar will lead to artificial characters not taught in schools and not found in grammar books.

    >>As I live in Egypt which is the largest Arabic speaking country in
    >>the world (70 millions), I never heard such complaint (for my
    >>example of course) and every one here is just satisfied with the
    >>Arabic encoding
    >Beware of taking silence for approval or even satisfaction.  There
    >is no alternative; that doesn't mean people are happy with the
    >legacy encoding, only that they have no choice.  In my experience,
    >there is plenty of disatisfaction with Arabic enabled software, but
    >at the same time since hardly anybody has any idea of what Unicode
    >is, it escapes blame.

    There is dissatisfaction with Arabic enabled programs but it is not always related to the encoding (bad fonts, don’t show Arabic properly, old encoding displays in modern systems as strange meaningless letters, bad translation of English into Arabic, etc.). What is the reason of dissatisfaction about encoding exactly?  

    >Semantically, it (teh marboota) (always?) denotes feminine
    >grammatical gender.

    Not always there are very few words that end with teh marbuta and are masculine like أسامة (Usamah – archaic word means lion and a common name as in Usamah bin Laden). Usually consider a word ending with teh marboota feminine till you are told or know otherwise.

    Michael Fayez

    This archive was generated by hypermail 2.1.5 : Sat Sep 17 2005 - 05:49:54 CDT