Re: The "f" in "fi" (was Re: Latin ligatures and Unicode)

Date: Thu Dec 23 1999 - 12:15:42 EST


       Welcome to the hairy world of programming for complex scripts.

>But, once the ligature is formed, it becomes a problem for
       screen editing. What the users think they do when they edit an
       electronic document is to insert, delete, substitute, move or
       mark *characters*. What they actually do symbolic actions on
       *glyphs*, that are the visual representation of characters, and
       this causes the software to actually change the characters in

       No, creating a ligature does not (or should not) cause any
       problems for screen editing. Software *renders* text as glyphs,
       but it should always allow the user to interact with the text
       in terms of characters. What this means is that any decent app
       that displays a fi ligature should allow the caret to be drawn
       in the middle of that glyph if, say, the carat is immediately
       to the left of that glyph and the user hits the right arrow
       key. When the caret is shown to the left, right or within the
       ligature glyph, the cursor into the text in memory should be
       pointing to the positions before the character f, between the
       characters f and i, and after the character i, respectively.
       Any editing operations the user performs will affect the
       characters in memory, and the screen will be redrawn to
       indicate the new state of affairs.

>Everybody already noticed that, if the "m" in the above
       examples is substituted with an "f", we are going to have
       troubles. In a system that displays "f" + "i" as a ligature, I
       cannot move the caret in the proper place in "fil" to add my
       "a". I can certainly delete the "a" in "faile" but, after I do
       this, my caret remains in an embarrassing location: "somewhere
       *inside* a ligature".

       If software is implemented properly, there is no reason why we
       shouldn't be able to deal with these issues. It should be
       possible to position the caret within the ligature so that you
       can add an "a" between "f" and "i"; deleting the "a" in "fai"
       will leave the caret within the ligature, and that is exactly
       the desired behaviour. Any software that doesn't do this is
       deficient. IMHO.

>What can programmers do about this? Some approaches:

>#1 Avoid ligatures. - This is not acceptable in a WYSIWIG
       environment and, for certain scripts, this is not acceptable
       even in the humblest text-only interface.

>#2 Split ligatures when the caret passes over them. - This is
       the same as #1 above, only less frequent.

>#3 Once a ligature is formed, treat it as if it was a single
       unit. - Most people, although perfectly literate, never noticed
       that "fl" looks slightly different from, say, "fb" or "fh". Do
       you want them to notice it just to decide they don't like
       *your* software?

>#4 Pretend that the "ffl" glyph represents the first "f" only;
       the second "f" and the "l" would then be zero-width things
       following the visible glyph. - This is the same as #3 above,
       but even more puzzling.

       None of these are acceptable, though #2 might be tolerable.

>But if our font represents an "fi" ligature as two ad-hoc
       artificial glyphs (plus an ad hoc kerning pair, plus an ad hoc
       contextual shaping rule), we
       obtain a double score:
       - The display looks pretty, just like a printed book;
       - The user's perception that characters = visible glyphs =
       keyboard strokes may be supported, for the sake of usability.

       If our software is done right, we just shouldn't need to resort
       to these kinds of kludges.

>Finally, the virama in the consonant clusters of many Indic
       scripts is *really* invisible and there is no way we can
       visualize it *and* claim we are WYSIWYG.

       Now you're getting into some interesting UI challenges for
       which I don't think standard solutions have been developed. The
       invisible character is really there in the text, and so a user
       ought to be able to manipulate it. But how can they do so if
       they can't even see it? This is also true for things like ZWSP,
       ZWJ, etc.

>For the "impossible" cases like the invisible viramas, I would
       step back to #3 above, trying to enforce the user's perception
       that virama is *not* a character by itself.

       There's another possibility. Consider this: the distinction
       between SPACE and NBSP isn't visible, but most word processors
       provide a "display non-printing characters" option where some
       visual queue is provided. This could be utilised for things
       like virama, ZWSP, ZWJ, etc: when the option is enabled, then
       these things appear, preferable using some representation that
       identifies them unmistakeably, such as "ZWJ" inside a dotted

       There are other possibilities:
       - change the shape of the caret (think of split cursors)
       - draw some kind of symbol, possibly coloured, above, below or
       on top of the glyphs surrounding the position of the invisible
       character; e.g. draw two small arrows pointed toward each other
       in red below the line to indicate ZWJ, and draw the arrows
       pointing away from each other to indicate ZWNJ

       The field is open for anybody and everybody to think of the
       best mechanisms to provide visual feedback to deal with these
       things. There will be lots of crazy ideas that will get
       rejected, but someone will come up with something clever, and
       if marketers and lawyers don't get in the way, we'll all be
       able to use the best ideas until, eventually, these things
       become as standard as double clicking.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT