Re: Arabic Script: A new Hamza is required for Urdu and Sindhi

From: Gregg Reynolds (unicode@arabink.com)
Date: Fri Sep 16 2005 - 17:42:14 CDT

  • Next message: Philippe Verdy: "Re: Monetary decimal separators"

    Michael Fayez wrote:
    >
    > From: /Lateef Sagar <lateef_sagar@yahoo.com>/
    > To: /unicode@unicode.org/
    > Subject: /Arabic Script: A new Hamza is required for Urdu and Sindhi/
    > Date: /Thu, 15 Sep 2005 03:38:03 -0700 (PDT)/
    >>Hi List,
    >
    > hi,
    >>I suggest a new Hamza for Urdu and Sindhi.
    >>
    ...
    >
    > The problem you are presenting also have an equivalent in Arabic.
    >
    > All feminine singular words ending with teh marboota when appending the
    > possessive pronoun to them the teh marbuta is converted into normal teh
    >
    > Like سيارة (sayarah - a car) سيارتي (sayaraty – my car)
    > not سيارةي we also have to train the user to change the teh
    > marboota into teh. So according to your suggestion it would be good to
    > have a letter ‍ ة ‍ة ـتـ (no initial form) or even modify the
    > properties of the existing letter. But this will only add more visual
    > ambiguity to the Unicode. When the medial teh letter is seen in a word
    > would it be a teh marboota in its medial form or a normal teh in its
    > medial form??? I will never know.

    On the contrary, it would make Unicode accurately reflect the actual
    relationship between character identity and glpyhic form in Arabic. If
    you look at written Arabic text, there is nothing to tell you that the t
    in sayaraty is actually a non-lexical ta ta'neeth. Only educated
    literates know this, because they bring their knowledge to the page.
    It's no different when looking at a computer monitor.

    It goes back to character identity v. form. The t in sayaraty and the
    "teh marboota" in sayarah are the *same character*. These words are
    under the same dictionary entry. To look up sayaraty, you don't look
    under syrt; you look under syr, where teh marboota is a subentry.
    That's the key. That is how we can say that e.g. medial and isolated
    yeh - two totally unrelated graphical forms - are two forms of the *same
    character*.

    So the legacy model of teh marboota unfortunately leads developers into
    a mistaken notion of the language. What's worse is it's effect on
    search/sort. It distorts sorting. E.g. should sort with and
    before but if the t in durraty is "normal" t, it will sort after the
    latter. For searching, a search on teh marboota should match even when
    there is a suffix.

    > Plus it will expose the already
    > established Arabic encoding to many problems as not only Arabic and
    > Sindhi will have their special artificial letters (artificial because
    > they are not part of the alphabet taught to children in school or in
    > books) but also all the other languages using the Arabic script will
    > have their special cases.

    I'm not sure what you mean here. Are you saying teh marbuta is not
    taught to children?
    >
    > As I live in Egypt which is the largest Arabic speaking country in the
    > world (70 millions), I never heard such complaint (for my example of
    > course) and every one here is just satisfied with the Arabic encoding

    Beware of taking silence for approval or even satisfaction. There is no
    alternative; that doesn't mean people are happy with the legacy
    encoding, only that they have no choice. In my experience, there is
    plenty of disatisfaction with Arabic enabled software, but at the same
    time since hardly anybody has any idea of what Unicode is, it escapes blame.
    >
    > The example you gave and the example I gave are like the case in English
    > in verbs ending with e like the verb "specialize" it is not the
    > responsibility of the Unicode standard to eliminate the final e when an
    > –ation is added to form the noun like in "specialization". I think

    This is nothing like the example of teh marbuta. What you describe is a
    change in morphology and thus semantics, which is properly outside the
    scope of Unicode. That is not the case with teh marboota shaping, nor
    with the Sindhi case, apparently.

    This issue came up a few months ago, when I speculated teh marboota
    might be considered a kind of morphophoneme. I was wrong; it is just a
    character with multiple shapes, just like the other shaped characters in
    Arabic. It always denotes /t/, even though in practice this phoneme is
    frequently dropped at word endings. Unlike teh, it has secondary
    lexical semantics, i.e. it has no distinct entry in the lexicon.
    Semantically, it (always?) denotes feminine grammatical gender. (You
    know all this but I add it for those readers unfamiliar with Arabic.)

    -gregg



    This archive was generated by hypermail 2.1.5 : Fri Sep 16 2005 - 17:44:21 CDT