ta' marbuta

From: Gregg Reynolds (greynolds@greynolds.com)
Date: Sun Aug 22 1999 - 22:38:40 EDT

Forgive me if you've already addressed this, but I looked in the online
files for version 3.0, and looks to me like ta' marbuta is still

_IF_ Unicode encodes visual forms, then the "behaviour" of ta' marbuta
as currently described is accurate, but the name is fundamentally
misleading, and the codepoint U+0629 must be construed as denoting a
graphemic unit (shape), and not an "abstract character"; but

_IF_ Unicode encodes abstract characters, and _IF_ codepoint U+0629
indeed denotes "ARABIC LETTER TEH MARBUTA", then Arabic joining table
6-7 is incorrect, and TEH MARBUTA should be moved to a different class
that includes the medial form.

RATIONALE: Arabic "ta marbuta" is a term that reflects grammatical
analysis; the dotted heh form is not usually included among the letters
of the Arabic alphabet. Literally "ta' marbuta" it means "binding t".
Table 6-7 gets it right for the isolated and final forms (same as heh,
but with two dots superfixed), but excludes the medial form, which is
identical with the medial form of ta (the ba group). That's why they
call it the _ta'_ marbuta. To an Arab literate, "ta' marbuta" may
assume any of the forms isolated dotted heh, final dotted heh, or medial

EXAMPLE: Consider the word "risAla", meaning missive, letter, message.
The final character is the ta marbuta. Now the important thing to
understand is that latin transcriptions generally don't know what to do
with the ta marbuta. In ordinary speech, our word would be spoken as I
wrote it above, with a final "a" sound. However, the written form of
the word uses the final ta marbuta, and fully phonated the word would be
read with a 't' sound, followed by a vowel sound depending on the
context. Example: "risAlatun", indefinite singular. But with a
suffixed pronoun denoting the possesive, we would have (assuming masc.
sing. for the pronoun) "risAlatuhu", "his letter". In this case the
word is written with medial ta. With a definite article, the ta marbuta
loses its tanween, but still may be phonated: al-risAla#u.

To make this more clear (I hope), here is the same thing, only written
with "#" representing the ta marbuta:

    1) risAla#
    2) risAla#un
    3) risAla#uhu
    4) risAla#u ~l-gufrAn ( "~l-" = definite article "al" with alif
al-wasla, so the "a" of "al" is replaced (phonetically) by the final
vowel of the preceeding word: risAlatulgufrAn.)

Both examples 1 and 2 could be pronounced the same way, with or without
the phonated ta marbuta. In other words, the "tanween" (the -un suffix
that denotes indefiniteness in nouns) is there, whether it is explicitly
written or not; and if it is explicitly written, one need not
necessarily pronounce it. We're shading over into pedagogical theory
now, since some programs do not teach proper (or at least full)
grammatical reading; this is closely connected to the problem of
diglossia in the Arab word (literary v. colloquial). Nonetheless, there
is no controversy over the grammar involved, only how it should be

In both examples 3 and 4, the sound 't' must be pronounced, since it
serves an indispensable grammatical function in the situation. However,
in example 3 it would also be written as a medial be ta', whereas in
example 4 it would be written as a final dotted ha.

It's worth noting that the "short" vowels are relatively unimportant in
Arabic, due to the structure of the language. Used to drive me nuts
when my Arabic teacher would tell me that, since in English vowels are
so crucial to meaning, but I've come to understand a bit more
intuitively how that should be. For example, changing the short vowel
in an English word almost always changes the lexical category, whereas
in Arabic it almost never does. Example: bad, bed, bid, bod, bud.
Five vowels, five completely distinct lexical items. In Arabic, about
the worst that could happen if you said 'u' where you should have said
'a' is confusion between active and passive voices: katab "he wrote",
kutib "it was written". If you were to say "kitub", "kitib", "katub",
etc., there would be no mistaking the lexical item, only which form of
the item "ktb" you intended to use.

I bring this up to emphasize that it is the phonation of the 't' sound
for the ta marbuta that carries the meaning. It signals the genetive
relation. "risAla al-kAtib", spoken with the 't' sound, means "a
letter, the writer"; but with it, "risAlatu ~l-kAtib" means "the letter
of the writer". Of course, in these two examples I left out the ta
marbuta symbol in order to emphasize the pronunciation; in Arabic,
however, they would have received the same spelling. Assuming, that is,
that the vowelling is omitted, as it usually is (tanween being
considered vowelling). If vowelling is included, then tanween means the
noun is indefinite, "a letter" and so the ta is not pronounced. Unless
you're reading with full phonation, in which case, you would say
"risAlatun", but the 't' would not be a binding 't', since the notion of
binding reflects a grammatical operation.

Sorry to be so long-winded about this. The basic point is this: ta'
marbuta cannot be modeled by a simple grampheme, but must be modeled
more like upper/lower case letters: different forms for different
(grammatical) situations.

Rather like alef maqsura, except that with ta marbuta you can always
determine what form to use, whereas with alef maqsura you need a

To meet the expectations of Arab literates, a search for a word like
risAla# should return all forms of the word, including the indefinite
noun "risAla#un", the definite noun with article "al-risAla#u", and any
form in the genetive construction, such as "risAla#uhu" and "risAla#a
~l-kAtibi". Such forms should also sort together. Unicode would not
support this, unless I misunderstood the spec.

But of course other languages using the Arabic script have different

General question: I assume the editors monitor this list, so I didn't
copy the bug list. Should I?


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT