RE: ta' marbuta

From: Becker, Joseph (Joseph.Becker@pahv.xerox.com)
Date: Mon Aug 23 1999 - 16:10:54 EDT


> Forgive me if you've already addressed this

It was a while ago!:

    | Date: 10 Nov 95 14:41:36 PST (Friday)
    | Subject: Re: More Arabic
    |
    | ...
    |
    | In the case of TEH MARBUTA, as a phenomenon it too is solely final.
If a word which would have ended with TEH MARBUTA is extended with
grammatical endings, the typist must replace TEH MARBUTA by an ordinary TEH;
i.e. the encoding is designed to require such replacement, rather than
having the TEH MARBUTA mutate into the *appearance* of a TEH. This
preserves the uniqueness of the correct spelling.
    |
    | ...
    |

The point is that the encoding is chosen to *model* (I like your term) the
linguistic / orthographic realities in one way or another, with design
choices being made to meet encoding constraints. In this case, the
convenience of providing an (automatic) medial-TEH form for TEH MARBUTA
would be balanced by the cost of creating duplicate encodings for all forms
like "risAla#uhu / risAlatuhu".

This tradeoff could have been made the other way, but ultimately all that
matters is that a single encoding *convention* become adopted in practice.
My understanding is that TEH MARBUTA is treated in this manner by the ASMO
449 and ISO 8859-6 standards that were the source for the Unicode Arabic
set, i.e. that "risAla#uhu" is conventionally encoded as "risAlatuhu" in
on-line Arabic text.

> a search for a word like risAla# should return all forms of the word ...
> Unicode would not support this

You correctly noted the analogy to upper/lower case letters, so the
analogous statement would be

  a search for a word like cat should return all forms of the word (incl.
Cat and CAT) ...
  ASCII would not support this

These statements are category errors: the encodings of course support any
text processing you care to program.

Joe



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT