Re: Help with some Arabic letters

From: Gregg Reynolds (greynolds@greynolds.com)
Date: Thu Dec 16 1999 - 04:24:04 EST

Next message: Jonathan Rosenne: "RE: Cantillation marks (teamim)"
Previous message: Roozbeh Pournader: "Re: Help with some Arabic letters"
Maybe in reply to: Patrick Andries: "Help with some Arabic letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Roozbeh Pournader wrote:
>
> On Wed, 15 Dec 1999, Gregg Reynolds wrote:
>
> ...blah blah...

>
> I'm not sure if a "GENERAL" tanwin helps. It's the definition of character
> that is somehow dependent on the user. If you are an Englishman, you may

No, it sounds like you're not following the argument. The question is
what is the meaning of tanween. There's no such thing as a "general"
tanwin in Arabic. Admiral Tanween, maybe. Ha, ha, that was a funny.
But to the point: "tanween", in Arabic, is the verbal noun associated
with the verb "nawwan", a transitive verb derived from "noon", i.e.
U+0646. To "nawwan" something means to stick a "noon" on it. There is
absolutely no ambiguity about this in Arabic: tanween means the
"noonation" of a word, which means adding a "noon" phoneme after the
vowel case ending. There is no question of fathatan, dammatan, and
kesratan - the way these words are used in Unicode is absurd from an
Arabic perspective. "Fathatan" means literally *a* fatha (indefinite)
in the accusative case, *not* fatha with tanween. The 'a' at the end,
to which the 'n' of tanween is suffixed, is the accusative case marker.
The final 'n' is nunation, and it marks indefiniteness. The terms
Unicode uses for U+064B, C, and D are literally at odds with the
representative glyphs. Indeed so are the words for U+064E, U+064F, and
U+0650, which should be called "fathatun", "dammatun", and "kasratun",
respectively - the name, plus the nominative case marker ("u"), plus the
indefinite marker (the "n" of nunation).

To illustrate for those without Arabic, using # as ta marbuta (marking a
unit noun) and N as tanween:

        "Look! It's a damma#uN!" - nominative, indefinite
        "I see a fatha#aN in your future!" - accusative (direct object),
indefinite
        "The size of a fatha#iN is generally small." - genitive (possesive),
indefinite

"The fatha#u is not to be confused with the tanween." - nominative,
definite (abstract)

"Behold the fatha#a: it laboreth not." - accusative, definite

"I don't understand the mysteries of the fatha#i."
- genitive, definite (abstract)

> prefer writing German a" with two keystrokes, and considering it two
> characters, but German people prefer it as one character, with one
> keystroke, etc.

Right: we should listen to the Germans, or, in the case of tanween, the
Arabs, for whom there is no controversy in this regard. And keystrokes
are irrelevant; the only thing that counts are the rules governing the
interpretation of text strings.

> There are legacy usages.

Presumably you mean legacy software; in my opinion the only legacy
*usage* that counts is the one sanctioned by the tradition of the
literate community, not the one foisted on us by mediocre software
design.

> Your addition will help sorting, but will
> increase the confuses.

That depends on whom you ask. A proper tanween would make things easier
for lots of users and implementers. For example, it would increase the
likelihood that vanilla "little language" software such as grep would be
useful beyond English.

> There exists well-defined legacy usage of tanwin

I don't think so, even aside from the fact that "tanween" in its proper
meaning is unavailable to legacy encoding systems. Even in the USA,
penetration of computers into the fabric of daily life is far from
complete. In the Arabic world I would venture to say (purely
speculatively) that it is miniscule. The only legacy is the investment
of SW firms, mostly in the West; why should Arabic speakers give a rat's
rear about their profits or market share? The design of written
language encodings should be driven by the language communities, not by
Western technocrats. Even if every piece of software ever written
supported "fathatan" et al., it would still be an improper model of
written Arabic.

> characters, and I think when a people needs both "fatha" and "fathatan",
> can search with regular expressions, or something equivalent to English
> "ignore case" searches. ADDING THOSE WILL REALLY COMPLICATE
> IMPLEMENTATION.

Whose implementation? Of what? In my opinion the notion that the only
people who count are the ones who write low-level, close to the metal
software, who think anything beyond ASCII is an unbearable burden, is
plain old wrong. Proper modeling of written languages is plainly better
for software that handles such languages. For some software it would
simplify implementation.

> And remember that you are suggesting this out of logic,
> which is not always in the same direction of usage.

No, I'm simply pointing out how written Arabic works. If software
developers say "we can't do that, it's too hard" that's fine, but they
should not turn around and claim to be language neutral.

One of the reasons I get a little excercised about stuff like this is
that Arabic is so perfectly suited to computation, yet so poorly served
by existing standards and implementations. Bear with me; eventually
I'll be able to explain why to people who don't have Arabic. I hope.

Sincerely,

-gregg

Next message: Jonathan Rosenne: "RE: Cantillation marks (teamim)"
Previous message: Roozbeh Pournader: "Re: Help with some Arabic letters"
Maybe in reply to: Patrick Andries: "Help with some Arabic letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:56 EDT