RE: Not all Arabics are created equal...

Date: Tue Jul 11 2000 - 12:26:59 EDT

Greg Reynolds wrote:
> The only remedy I can see for this particular flaw in Unicode is the
> introduction of a codepoint to set or maybe swap the
> evaluation rule for
> number strings.

It is not a flaw. Rather, IMHO, we are all doing the mistake of considering
this as an *encoding* issue. Which is not: it is a *keyboard input* problem.

In fact, whether your pen traces digits from R to L or from L to R, the
result on paper is always the same: less significant digit on the *left*,
and most significant digit on the *right*. And knowing how people *read*
numbers may only be of interest for linguists (if read aloud) or for
psychologists (if read silently).

So it makes sense, on computers, to adopt an uniform encoding sequence for
the digits in a number. In theory, there is absolutely no valid reason to
prefer MSB-first over LSB-first encoding or vice versa. In practice,
however, the MSB-first sequence is so widely implemented on computers that
it is worth sticking to it.

The real problem is: in which order should the user *type* the digits in a
number? This is important, because, of course, the typing sequence should
correspond as closely as possible to the habits of handwriting.

A right-to-left typing order simply implies that, after having typed a
digit, the "caret" (insertion point) must be placed on the *left* of the
digit just entered, so that it is in the proper state to accept the next
digit ("next" in typing order, but not necessarily "before" in storing
order). Analogously, in left-to-right typing order, the caret should be
placed on the *right* of the digit just typed.

So I would change the issue to "What should the *typing* sequence for Arabic
digits be?" And the best answer, IMHO, is "It depends"...

And it indeed depends on a lot of factors:

- The language used (Arabic, Urdu, Persian, Pashto, etc.);

- The syntax assumed for numbers (for languages, like Arabic, that admit
more than one);

- The set of digits used (U+0030..0039, U+0660..0669, or U+06F0..06F9 are
all used with the Arabic script);

- The typing context (e.g. a car registration vs. a price);

- The personal culture or taste (e.g. an Moroccan mathematician vs. an
Iranian grocer).

All summed up, the reasons for this choice are so complex and personal that
it will never be possible find a compromise.

So, IMHO, computers should have a flag for the typing order of digits, set
to a default at start up, and modifiable by the user at any time. Like "caps
lock", you know?

And, after all, why should such a flag only be there for users of RTL
scripts? It is not 100% true that we Europeans *always* write numbers LTR.
Personally, I always use the RTL order when writing the results of sums.

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT