Re: Arabic Script: A new Hamza is required for Urdu and Sindhi

From: Gregg Reynolds (unicode@arabink.com)
Date: Fri Sep 16 2005 - 17:42:14 CDT

Next message: Philippe Verdy: "Re: Monetary decimal separators"

Previous message: Philippe Verdy: "Re: Monetary decimal separators"
In reply to: Michael Fayez: "RE: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Next in thread: Michael Fayez: "Re: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Reply: Michael Fayez: "Re: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Michael Fayez wrote:
>
> From: /Lateef Sagar <lateef_sagar@yahoo.com>/
> To: /unicode@unicode.org/
> Subject: /Arabic Script: A new Hamza is required for Urdu and Sindhi/
> Date: /Thu, 15 Sep 2005 03:38:03 -0700 (PDT)/
>>Hi List,
>
> hi,
>>I suggest a new Hamza for Urdu and Sindhi.
>>
...
>
> The problem you are presenting also have an equivalent in Arabic.
>
> All feminine singular words ending with teh marboota when appending the
> possessive pronoun to them the teh marbuta is converted into normal teh
>
> Like ط³ظٹط§ط±ط© (sayarah - a car) أ ط³ظٹط§ط±طھظٹ (sayaraty â€“ my car)
> not ط³ظٹط§ط±ط©ظٹ we also have to train the user to change the teh
> marboota into teh. So according to your suggestion it would be good to
> have a letter â€چ ط© â€چط© ظ€طھظ€ (no initial form) or even modify the
> properties of the existing letter. But this will only add more visual
> ambiguity to the Unicode. When the medial teh letter is seen in a word
> would it be a teh marboota in its medial form or a normal teh in its
> medial form??? I will never know.

On the contrary, it would make Unicode accurately reflect the actual
relationship between character identity and glpyhic form in Arabic. If
you look at written Arabic text, there is nothing to tell you that the t
in sayaraty is actually a non-lexical ta ta'neeth. Only educated
literates know this, because they bring their knowledge to the page.
It's no different when looking at a computer monitor.

It goes back to character identity v. form. The t in sayaraty and the
"teh marboota" in sayarah are the *same character*. These words are
under the same dictionary entry. To look up sayaraty, you don't look
under syrt; you look under syr, where teh marboota is a subentry.
That's the key. That is how we can say that e.g. medial and isolated
yeh - two totally unrelated graphical forms - are two forms of the *same
character*.

So the legacy model of teh marboota unfortunately leads developers into
a mistaken notion of the language. What's worse is it's effect on
search/sort. It distorts sorting. E.g. درتي should sort with درة and
before درب but if the t in durraty is "normal" t, it will sort after the
latter. For searching, a search on teh marboota should match even when
there is a suffix.

> Plus it will expose the already
> established Arabic encoding to many problems as not only Arabic and
> Sindhi will have their special artificial letters (artificial because
> they are not part of the alphabet taught to children in school or in
> books) but also all the other languages using the Arabic script will
> have their special cases.

I'm not sure what you mean here. Are you saying teh marbuta is not
taught to children?
>
> As I live in Egypt which is the largest Arabic speaking country in the
> world (70 millions), I never heard such complaint (for my example of
> course) and every one here is just satisfied with the Arabic encoding

Beware of taking silence for approval or even satisfaction. There is no
alternative; that doesn't mean people are happy with the legacy
encoding, only that they have no choice. In my experience, there is
plenty of disatisfaction with Arabic enabled software, but at the same
time since hardly anybody has any idea of what Unicode is, it escapes blame.
>
> The example you gave and the example I gave are like the case in English
> in verbs ending with e like the verb "specialize" it is not the
> responsibility of the Unicode standard to eliminate the final e when an
> â€“ation is added to form the noun like in "specialization". I think

This is nothing like the example of teh marbuta. What you describe is a
change in morphology and thus semantics, which is properly outside the
scope of Unicode. That is not the case with teh marboota shaping, nor
with the Sindhi case, apparently.

This issue came up a few months ago, when I speculated teh marboota
might be considered a kind of morphophoneme. I was wrong; it is just a
character with multiple shapes, just like the other shaped characters in
Arabic. It always denotes /t/, even though in practice this phoneme is
frequently dropped at word endings. Unlike teh, it has secondary
lexical semantics, i.e. it has no distinct entry in the lexicon.
Semantically, it (always?) denotes feminine grammatical gender. (You
know all this but I add it for those readers unfamiliar with Arabic.)

-gregg

Next message: Philippe Verdy: "Re: Monetary decimal separators"
Previous message: Philippe Verdy: "Re: Monetary decimal separators"
In reply to: Michael Fayez: "RE: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Next in thread: Michael Fayez: "Re: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Reply: Michael Fayez: "Re: Arabic Script: A new Hamza is required for Urdu and Sindhi"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 16 2005 - 17:44:21 CDT