RE: FATHATAN (was: RE: Presentation of unknowned composited seque nces

From: Reynolds, Gregg (greynolds@datalogics.com)
Date: Tue Jul 20 1999 - 09:57:53 EDT


Hello Roozbah,

> -----Original Message-----
> From: Roozbeh Pournader [mailto:roozbeh@sina.sharif.ac.ir]
> Sent: Tuesday, July 20, 1999 9:06 AM
> To: Arno Schmitt
> Cc: Unicode List; Gregg Reynolds
> Subject: Re: FATHATAN (was: RE: Presentation of unknowned composited
> sequences
>
>
>
> On Thu, 15 Jul 1999, Arno Schmitt wrote:
>
> > Dear Gregg, Dear Roozbeh, Dear Reader of The Holy Qur'an
> >
> > in all my copies of the Qur'an there are
> > two sorts of fathatan,
> > two sorts of dammata,
> > two sorts of kasratan.
>
> Yes, but there is no difference in meaning, the shape of
> fathatan in this
> case is algorithmically derivable, based on a set of rules named
> ``tajweed'' I think. They are there to ease reading, to help the
> non-professional reader while reading. They help him not to think of
> tajweed rules.

This is pretty much the case for all Arabic vowel markings: they can be
inferred for the most part if your software is smart enough. Ditto for
hamza in many cases. But this brings up an interesting design point: is the
goal to support the absolute minimal encoding, relying on intelligent
software to infer information that is not explicitly encoded, or is the goal
to provide maximal encoding, so that minimal software can do the right
thing? It might be a Good Thing for Unicode to explicitly address this
tradeoff.

My own opinion is that, as is the case for most engineering tradeoffs, there
is a sweet spot somewhere in the middle. "Maximal" encoding, after all,
could be interpreted to mean full encoding of the underlying grammatical
structure. It would certainly be useful to have a standard for such
encoding, but I'm betting Unicode does want to go there. But "minimal"
would not do much to spread i18n-enabled software. In this particular case
(and some others, not limited to Arabic), I would argue for the option of
explicit encoding, so I would like to see distinct codepoints for these
"letters". On theoretical grounds: they do have different semantics. On
pragmatic grounds: the more information we get to include in the text
itself, the less we need to rely on specialized software.

For those of you who have no idea what we're talking about, I will post a
note tonight after work explaining the forms under discussion, with
examples.

>I don't know the Unicode policy about this,
> but I am tired
> of these symbol-characters. Adding these for only being able to have
> Qur'an as plain text is not a good idea. (You know, I want
> to see many
> vendors or authors implement Arabic unicode int their
> software, and these
> make it more ambiguous and difficult.)

On the contrary, I think reduces the amount of specialized logic required,
and thus increases the liklihood of portable software that can handle
languages using Arabic script. Remember the holy grail of i18n is one
general piece of software that adapts with minimal expense.

Here's a question you might be able to help me with: are such markings also
used in other texts, such as the hadeeth literature?

>
> > In the standard Kaireene edition there are explanation at the end
> > of the vcolume with a paragraph on tarkeeb al-Harakatain.
>
> I don't know what do you mean by "standard Kaireene edition". What we

We probably don't want to get into this discussion.

> consider the standard in the muslim world is the one published by
> government of Saudi Arabia. (With differences in non-letters
> of course,
> for example the official Iranian Qur'an has differences in
> placement of
> "waqf" signs, like U+06D6 or U+06D9.)
>
> --Roozbeh
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT