RE: Not all Arabics are created equal...

From: Gregg Reynolds (
Date: Mon Jul 10 2000 - 22:42:58 EDT

> -----Original Message-----
> From: John Cowan []
> Sent: Saturday, July 08, 2000 6:50 PM
> To: Gregg Reynolds
> Cc: Unicode List
> Subject: RE: Not all Arabics are created equal...
> On Thu, 29 Jun 2000, Gregg Reynolds wrote:
> > As a point of fact: Arabic is not bidirectional, in spite of the
> > protestations of Unicode. It's more accurate and less prejudicial
> > technologically to call call it a
> Least-Significant-Digit-First language.
> IIRC, we determined that in fact Arabic digits are written MSD first,
> definitely when writing Farsi (I'm not sure if we got
> evidence on writing
> Arabic or not).

It's the same for all Arabiform writing conventions, so far as I know, and
it's actually quite simple. You need only ask two questions: what is the
reading order? and what is the evaluation rule for strings of ciphers?
"Reading order" means canonical reading order, since this is not about
deciphering what goes on inside the brains of readers. The evaluation rule
for Arabiform number strings is the well-known decimal positional rule.
Least significant digit according to that rule always comes first _in
reading order_. The opposite obtains in European languages. Notice the
advantage of LSD first: you don't have to count the number of digits in
order to start reading them off. (Quick, read "57777777" aloud.) Well,
these days the speech protocol is MSD first (except for the two LSD digits),
but who's counting? Neither writing protocol nor speech protocol is
relevant to the interpretation of the written forms. Does it really matter,
which pixels get lit first? And the word "seventeen" does not mean the
seven gets written before the teen. I imagine this is pretty obvious so I
won't belabor it. For the record, the Arabic language (Classical, spoken
variety) has always accepted both LSD and MSD first verbalization of
numbers. See Wright's Grammar, Book I paragraph 327 (page 259 in my

The only remedy I can see for this particular flaw in Unicode is the
introduction of a codepoint to set or maybe swap the evaluation rule for
number strings.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT