RE: Unicode editing (RE: Unicode complaints)

From: Marco Cimarosti (
Date: Mon Mar 19 2001 - 08:52:58 EST

Roozbeh Pournader wrote:
> ...Take
> this example: she wants to type "MEEM-SEEN-TEH-QAF-LAM". She
> presses Meem,
> she sees an isolated Meem, she presses Seen, the Meem becomes initial
> Meem, and a final Seen gets added. She presses Teh, Seen
> becomes medial,
> final Tah getting added, .... What if she could see initial
> Meem, medial
> Seen, etc at the beginning? I know, this way she would see a
> medial Lam at
> first, but that will become a final Lam as soon as she
> presses the space.

I never considered this. For a casual user it is so cute to see the letters
changing shape, and it is also very instructive for one learning the script.

But I see how this must be annoying for people typing in Arabic all the

> Perhaps there's an easy way for an app to achieve this
> without resorting to
> maintaining an additional parallel representation of the text as I
> understood Marco to be suggesting. Here's an idea:

I was not suggesting that to solve this problem (as I said, I didn't know
the problem existed before Roozbeh mentioned it).

Moreover, I was not suggesting a *parallel* representation of the text, if
you mean by this that both the original and the "WYSIWYG" versions are kept
in memory at the same time.

I was suggesting an even more heretic approach: not using Unicode at all
internally. Rather, I'd translate the Unicode text to an *alternative*
representation (although it would probably be a sort of "pseudo-Unicode",
but not quite the same thing), use it for editing, and translate it back to
proper Unicode only at the end of the editing session.

> If the following two conditions apply:
> 1. the insertion point is not before a word-forming Arabic (or other
> connective script) character, and
> 2. some local (i.e. adjacent to the insertion point) change
> to the text (insertion or deletion) has occurred since the insertion
> was moved to its current position
> then output a ZWJ immediately before the insertion position
> when rendering
> to the screen. The ZWJ is not added to the backing store; it is just
> inserted into the stream sent to the screen.

I think you need another condition:

3. a word-forming Arabic (or other connective script) character has just
been typed.

Without this condition, you'd keep on inserting temporary ZWJ's also in non
Arabic context. This would be simply useless most of the time, but could
also cause unexpected results in some cases (e.g., with Indic scripts, or
European ligatures).

What about also adding a time delay? If nothing happens after a certain time
(say 1 or 2 seconds), the temporary ZWJ is removed.

(BTW, the same trick could be possible also with my approach. Just I would
not use a ZWJ (that would not even exist in "pseudo-Unicode"), but rather
directly change the code of the letter on the right of the cursor).

_ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:20 EDT