Re: Arabic Script: A new Hamza is required for Urdu and Sindhi

From: Lateef Sagar (
Date: Sun Sep 18 2005 - 06:03:21 CDT

  • Next message: Anto'nio Martins-Tuva'lkin: "Re: Dead keys (was: "Re: Monetary decimal separators")"

    Gentlemen, my question was about including a new Hamza for Sindhi and Urdu. The issue with teh marboota might be solved easily if a medial shape is introduced. But for my initial question, it may not, because all the Hamza's in Unicode are used in Arabic and introducing a separate shape for existing Hamza's will definitely cause issues.
    The Hamza that is taught to children in schools, for Sindhi and Urdu is just one, with the shapes that I mentioned in my earlier email. If you like I can post a scanned picture of text books for clarification. All other characters, required for Sindhi and Urdu are very well supported by Unicode. Local printing industry is using Unicode based software and web development houses are making Unicode based Sindhi and Urdu web sites. But as I mentioned in my early email, sorting and searching is now creating problem with words with Hamza, and particularly verbs and their forms. In Arabic Grammar there might be a rule for replacing teh marboota with teh, when required in initial or medial forms, but since there is only one Hamza in Urdu and Sindhi and there is no Hamza above yeh in these languages, therefore no such rule exist in Sindhi and Urdu Grammar that says to change the isolated hamza with hamza above yah when initial or medial forms are required.
    I want to know your opinion so that I can push Urdu Language Authority, Sindhi Language Authority and CRULP to submit a detailed proposal for new hamza.
    Thanks and regards

    Michael Fayez <> wrote:

    From: Gregg Reynolds <>
    To: Michael Fayez <>
    Subject: Re: Arabic Script: A new Hamza is required for Urdu and Sindhi
    Date: Fri, 16 Sep 2005 17:42:14 -0500
    >Michael Fayez wrote:
    >>All feminine singular words ending with teh marboota when appending
    >>the possessive pronoun to them the teh marbuta is converted into
    >>normal teh
    >>Like سيارة (sayarah - a car) سيارتي (sayaraty – my
    >>car) not سيارةي we also have to train the user to change the
    >>teh marboota into teh. So according to your suggestion it would be
    >>good to have a letter ‍ ة ‍ة ـتـ (no initial form) or even
    >>modify the properties of the existing letter. But this will only
    >>add more visual ambiguity to the Unicode. When the medial teh
    >>letter is seen in a word would it be a teh marboota in its medial
    >>form or a normal teh in its medial form??? I will never know.
    >On the contrary, it would make Unicode accurately reflect the actual
    >relationship between character identity and glpyhic form in Arabic.
    >If you look at written Arabic text, there is nothing to tell you
    >that the t in sayaraty is actually a non-lexical ta ta'neeth. Only
    >educated literates know this, because they bring their knowledge to
    >the page. It's no different when looking at a computer monitor.
    >It goes back to character identity v. form. The t in sayaraty and
    >the "teh marboota" in sayarah are the *same character*. These words
    >are under the same dictionary entry. To look up sayaraty, you don't
    >look under syrt; you look under syr, where teh marboota is a
    >subentry. That's the key. That is how we can say that e.g. medial
    >and isolated yeh - two totally unrelated graphical forms - are two
    >forms of the *same character*.
    >So the legacy model of teh marboota unfortunately leads developers
    >into a mistaken notion of the language. What's worse is it's effect
    >on search/sort. It distorts sorting. E.g. should sort with and
    >before but if the t in durraty is "normal" t, it will sort after
    >the latter. For searching, a search on teh marboota should match
    >even when there is a suffix.

    Ok. But we will need to change the properties of the encoding of Arabic. At least there should be some evidence to change it not just a grammatical rule. There should be an authentic Arabic grammar book that says that

    1- teh marbuta has a medial form

    2- teh marbuta in its medial form has the same shape of teh in medial form

    And the same should be for Sindhi.


    >>Plus it will expose the already established Arabic encoding to many
    >>problems as not only Arabic and Sindhi will have their special
    >>artificial letters (artificial because they are not part of the
    >>alphabet taught to children in school or in books) but also all
    >>the other languages using the Arabic script will have their special
    >I'm not sure what you mean here. Are you saying teh marbuta is not
    >taught to children?

    This is not what I meant. What I meant is: The children in school are taught that teh marbuta is always at the end of the word. They are never taught any case when the teh marbuta is in middle or beginning of a word, when teh marbuta is followed by a possessive pronoun it is converted into teh and not differentiated from the normal teh letter. It has nothing to do with Unicode or any other character encoding as teaching Arabic (at least when I was a child) does not involve computers. Accepting the proposal of Mr. Lateef Sagar will lead to artificial characters not taught in schools and not found in grammar books.

    >>As I live in Egypt which is the largest Arabic speaking country in
    >>the world (70 millions), I never heard such complaint (for my
    >>example of course) and every one here is just satisfied with the
    >>Arabic encoding
    >Beware of taking silence for approval or even satisfaction. There
    >is no alternative; that doesn't mean people are happy with the
    >legacy encoding, only that they have no choice. In my experience,
    >there is plenty of disatisfaction with Arabic enabled software, but
    >at the same time since hardly anybody has any idea of what Unicode
    >is, it escapes blame.

    There is dissatisfaction with Arabic enabled programs but it is not always related to the encoding (bad fonts, don’t show Arabic properly, old encoding displays in modern systems as strange meaningless letters, bad translation of English into Arabic, etc.). What is the reason of dissatisfaction about encoding exactly?

    >Semantically, it (teh marboota) (always?) denotes feminine
    >grammatical gender.

    Not always there are very few words that end with teh marbuta and are masculine like أسامة (Usamah – archaic word means lion and a common name as in Usamah bin Laden). Usually consider a word ending with teh marboota feminine till you are told or know otherwise.

    Michael Fayez


    Lateef Sagar Shaikh
    Do You Yahoo!?
    Tired of spam? Yahoo! Mail has the best spam protection around

    This archive was generated by hypermail 2.1.5 : Sun Sep 18 2005 - 06:05:53 CDT