RE: Unicode editing (RE: Unicode complaints)

From: Marco Cimarosti (
Date: Mon Mar 19 2001 - 07:51:19 EST

Roozbeh Pournader wrote:
> > - All sequences of characters that are perceived as single
> letters by users
> > are treated as such (e.g., laam-alif in Arabic or the ksha
> ligature in many
> > Indic scripts). Of course, the "DErenderer" maps these
> extra glyphs back to
> > the corresponding sequences of Unicode characters.
> "Lam-Alef" is not considered a single letter by Persian
> users. No recent
> Persian keyboard has it. I also believe that Arabic keyboards have it
> because of backward-compatiblity only.

But I also talked an "automatic adjustment" of shapes during typing. This
can (and should) also be extended to merging two glyphs into one.

And this could happen with lam-alef, not only for users who don't have the
ligature on their keyboard, but also for users who have it but fail to use

But you can always turn off the "automatic adjustment" and type an unusual
combination of shapes, including "initial lam + final alef", if that is

By the way, this reminds me of a point that could be interesting to you: a
sequence that violates the normal ligating behavior (like the "initial lam +
final alef" above) would automatically generate the proper Unicode sequence
like LAM+ZWJ+ZWNJ+ZWJ+ALEF with no need for users to know about the sequence
ZWJ+ZWNJ+ZWJ, that I know you don't like.

> In the Arabic case, this is old behaviour, one that should be
> avoided at all costs.

Well, people who are used to it may think differently. And there may be
cases when the same user wants both behaviors, depending on what she is

> Many Persian keyboards have ZWNJ and ZWJ on them, and the
> important thing is that the users feel at home with them, [...]

Nothing impedes to use ZWJ and ZWNJ keys as *function keys* that force
joining or splitting of the characters near cursor.

What the user sees is the same anyway. And also what will end up in the
actual Unicode file is the same, but you automatically get rid of
unnecessary ZW(N)J characters (i.e. a ZWJ between two characters that would
join anyway, or a ZWNJ between two characters that cannot join).

_ Marco

