Thanks for your response. Alas, I don't think it addresses the issues I
would like to raise: I think the passage you quote misinterprets the
semantics of ta marbuta. However, as a courtesy to those on the list who
consider such discussions more noise than signal, I'll put the full
explanation of it on a web page, along with the text of a standard grammar,
and send the url to the list.
You advance one argument that I would like to address, however, and that is
the CAT/cAt/cat/etc distinction, the handling of which you seem to claim is
an implementation issue. I would point out that case distinctions are
normative in Unicode, so we know the semantics regardless of implementation
behavior. It's not clear to me if Joining Class is normative or not:
presumably it is, though it is not listed in the table at the beginning of
Chapter 4. But in this case, it looks to me like Unicode positively
prohibits proper interpretation of ta marbuta in Arabic langauge texts!
Have I misunderstood something about Unicode here?
> -----Original Message-----
> From: Becker, Joseph [mailto:Joseph.Becker@pahv.xerox.com]
> Sent: Monday, August 23, 1999 3:06 PM
> To: Unicode List
> Cc: Unicode List
> Subject: RE: ta' marbuta
> > Forgive me if you've already addressed this
> It was a while ago!:
> | Date: 10 Nov 95 14:41:36 PST (Friday)
> | Subject: Re: More Arabic
> | ...
> | In the case of TEH MARBUTA, as a phenomenon it too is
> solely final.
> If a word which would have ended with TEH MARBUTA is extended with
> grammatical endings, the typist must replace TEH MARBUTA by
> an ordinary TEH;
> i.e. the encoding is designed to require such replacement, rather than
> having the TEH MARBUTA mutate into the *appearance* of a TEH. This
> preserves the uniqueness of the correct spelling.
> | ...
> The point is that the encoding is chosen to *model* (I like
> your term) the
> linguistic / orthographic realities in one way or another, with design
> choices being made to meet encoding constraints. In this case, the
> convenience of providing an (automatic) medial-TEH form for
> TEH MARBUTA
> would be balanced by the cost of creating duplicate encodings
> for all forms
> like "risAla#uhu / risAlatuhu".
> This tradeoff could have been made the other way, but
> ultimately all that
> matters is that a single encoding *convention* become adopted
> in practice.
> My understanding is that TEH MARBUTA is treated in this
> manner by the ASMO
> 449 and ISO 8859-6 standards that were the source for the
> Unicode Arabic
> set, i.e. that "risAla#uhu" is conventionally encoded as
> "risAlatuhu" in
> on-line Arabic text.
> > a search for a word like risAla# should return all forms of
> the word ...
> > Unicode would not support this
> You correctly noted the analogy to upper/lower case letters, so the
> analogous statement would be
> a search for a word like cat should return all forms of the
> word (incl.
> Cat and CAT) ...
> ASCII would not support this
> These statements are category errors: the encodings of course
> support any
> text processing you care to program.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:51 EDT