Welcome to the hairy world of programming for complex scripts.
>But, once the ligature is formed, it becomes a problem for
screen editing. What the users think they do when they edit an
electronic document is to insert, delete, substitute, move or
mark *characters*. What they actually do symbolic actions on
*glyphs*, that are the visual representation of characters, and
this causes the software to actually change the characters in
No, creating a ligature does not (or should not) cause any
problems for screen editing. Software *renders* text as glyphs,
but it should always allow the user to interact with the text
in terms of characters. What this means is that any decent app
that displays a fi ligature should allow the caret to be drawn
in the middle of that glyph if, say, the carat is immediately
to the left of that glyph and the user hits the right arrow
key. When the caret is shown to the left, right or within the
ligature glyph, the cursor into the text in memory should be
pointing to the positions before the character f, between the
characters f and i, and after the character i, respectively.
Any editing operations the user performs will affect the
characters in memory, and the screen will be redrawn to
indicate the new state of affairs.
>Everybody already noticed that, if the "m" in the above
examples is substituted with an "f", we are going to have
troubles. In a system that displays "f" + "i" as a ligature, I
cannot move the caret in the proper place in "fil" to add my
"a". I can certainly delete the "a" in "faile" but, after I do
this, my caret remains in an embarrassing location: "somewhere
*inside* a ligature".
If software is implemented properly, there is no reason why we
shouldn't be able to deal with these issues. It should be
possible to position the caret within the ligature so that you
can add an "a" between "f" and "i"; deleting the "a" in "fai"
will leave the caret within the ligature, and that is exactly
the desired behaviour. Any software that doesn't do this is
>What can programmers do about this? Some approaches:
>#1 Avoid ligatures. - This is not acceptable in a WYSIWIG
environment and, for certain scripts, this is not acceptable
even in the humblest text-only interface.
>#2 Split ligatures when the caret passes over them. - This is
the same as #1 above, only less frequent.
>#3 Once a ligature is formed, treat it as if it was a single
unit. - Most people, although perfectly literate, never noticed
that "fl" looks slightly different from, say, "fb" or "fh". Do
you want them to notice it just to decide they don't like
>#4 Pretend that the "ffl" glyph represents the first "f" only;
the second "f" and the "l" would then be zero-width things
following the visible glyph. - This is the same as #3 above,
but even more puzzling.
None of these are acceptable, though #2 might be tolerable.
>But if our font represents an "fi" ligature as two ad-hoc
artificial glyphs (plus an ad hoc kerning pair, plus an ad hoc
contextual shaping rule), we
obtain a double score:
- The display looks pretty, just like a printed book;
- The user's perception that characters = visible glyphs =
keyboard strokes may be supported, for the sake of usability.
If our software is done right, we just shouldn't need to resort
to these kinds of kludges.
>Finally, the virama in the consonant clusters of many Indic
scripts is *really* invisible and there is no way we can
visualize it *and* claim we are WYSIWYG.
Now you're getting into some interesting UI challenges for
which I don't think standard solutions have been developed. The
invisible character is really there in the text, and so a user
ought to be able to manipulate it. But how can they do so if
they can't even see it? This is also true for things like ZWSP,
>For the "impossible" cases like the invisible viramas, I would
step back to #3 above, trying to enforce the user's perception
that virama is *not* a character by itself.
There's another possibility. Consider this: the distinction
between SPACE and NBSP isn't visible, but most word processors
provide a "display non-printing characters" option where some
visual queue is provided. This could be utilised for things
like virama, ZWSP, ZWJ, etc: when the option is enabled, then
these things appear, preferable using some representation that
identifies them unmistakeably, such as "ZWJ" inside a dotted
There are other possibilities:
- change the shape of the caret (think of split cursors)
- draw some kind of symbol, possibly coloured, above, below or
on top of the glyphs surrounding the position of the invisible
character; e.g. draw two small arrows pointed toward each other
in red below the line to indicate ZWJ, and draw the arrows
pointing away from each other to indicate ZWNJ
The field is open for anybody and everybody to think of the
best mechanisms to provide visual feedback to deal with these
things. There will be lots of crazy ideas that will get
rejected, but someone will come up with something clever, and
if marketers and lawyers don't get in the way, we'll all be
able to use the best ideas until, eventually, these things
become as standard as double clicking.
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT