Re: The "f" in "fi" (was RE: Latin ligatures and Unicode)

From: Mark E. Davis (
Date: Thu Dec 23 1999 - 12:46:36 EST

You might take a look at

for a discussion of layout issues and careting. We've found it important to
allow the metric information to be separated, so that a glyph is not purely
measured independently of the underlying characters that it represents. This
allows an additional option:

#5. Maintain metric information for the virtual parts of the ligature glyph that
correspond to individual characters. This allows a caret to be drawn in the
middle of an "fi" ligature.

A "poor man's" version of this is to just keep track of how many characters a
glyph represents, and divide the advance width by the number in placing the
caret. Of course, this needs to be synchronized with the highlighting and
mouse-clicking, as discussed in the paper above. A slightly more sophisticated
approach includes the widths of the parts in association with the glyph. That
way the caret in a "ct" ligature can come optically between the c and the t,
rather than just at the midpoint.

For glyphs that cannot be horizontally split (like lam-meem-jeem-initial), or
glyphs that do not represent a contiguous sequence of characters in memory (such
as where Indic glyph rearrangement changes the order of glyphs that then
ligate), the poor-man's approach is to just treat them as an atomic unit. A more
sophisticated approach could use non-horizontal divisions, but this is probably
not worth the investment in time and energy to support.


P.S. I reference it off my homepage, There are a few other
papers there you might be interested in. wrote:

> John Cowan recently wrote:
> >ZWL, though, does not cause "f" to become "the f-form used with i
> following",
> >nor "i" to become "the i-form used with f preceding", because there are
> >no such things, and it would be intolerably ad hoc to make them so.
> My first thought about this was "Right, of course: a ligated 'fi' is a
> single glyph, whether or not it is used to render a single code point or it
> is just an 'f' + 'i' sequence".
> However, for the following two days, I could not help stopping visualizing
> things like "the 'f' in 'fl'", "the 'i' in 'fi'", "the 'alif' in
> 'l?alif'", etc.
> In a WYSIWYG environment, everybody expects that any instance of sequences
> like "f" + "i" is displayed as a ligature, if the fonts so permits and
> dictates (well, everybody but some friends of mine:-).
> But, once the ligature is formed, it becomes a problem for screen editing.
> What the users think they do when they edit an electronic document is to
> insert, delete, substitute, move or mark *characters*. What they actually do
> symbolic actions on *glyphs*, that are the visual representation of
> characters, and this causes the software to actually change the characters
> in memory.
> If I wanted to type "mail" but inadvertently wrote "mil", what I want to do
> is to move my caret between the "m" and the "i" and add the missing "a". And
> I can do it.
> If I wanted to type "mile" but inadvertently wrote "maile", what I want to
> do is to move my caret between the "m" and the "a" and hit the
> DEL-RIGHT-CHAR key. I can do it, and my caret correctly remains where the
> "a" used to be: between the "m" and the "i".
> Everybody already noticed that, if the "m" in the above examples is
> substituted with an "f", we are going to have troubles. In a system that
> displays "f" + "i" as a ligature, I cannot move the caret in the proper
> place in "fil" to add my "a". I can certainly delete the "a" in "faile" but,
> after I do this, my caret remains in an embarrassing location: "somewhere
> *inside* a ligature".
> What can programmers do about this? Some approaches:
> #1 Avoid ligatures. - This is not acceptable in a WYSIWIG environment and,
> for certain scripts, this is not acceptable even in the humblest text-only
> interface.
> #2 Split ligatures when the caret passes over them. - This is the same as #1
> above, only less frequent.
> #3 Once a ligature is formed, treat it as if it was a single unit. - Most
> people, although perfectly literate, never noticed that "fl" looks slightly
> different from, say, "fb" or "fh". Do you want them to notice it just to
> decide they don't like *your* software?
> #4 Pretend that the "ffl" glyph represents the first "f" only; the second
> "f" and the "l" would then be zero-width things following the visible glyph.
> - This is the same as #3 above, but even more puzzling.
> But if our font represents an "fi" ligature as two ad-hoc artificial glyphs
> (plus an ad hoc kerning pair, plus an ad hoc contextual shaping rule), we
> obtain a double score:
> - The display looks pretty, just like a printed book;
> - The user's perception that characters = visible glyphs = keyboard strokes
> may be supported, for the sake of usability.
> Of course, this idea has its problems too. It is easy to see the single
> letters in Latinate "ffl"; also seeing the single letters in Arabic
> "l?alif" is easy, but not as mach; but seeing the "m? in "l?m? is
> admittedly quite hard; and spotting the "ka" and "sha" in Devanagari "ksha"
> requires some historical lessons from Peter T. Daniels himself.
> Finally, the virama in the consonant clusters of many Indic scripts is
> *really* invisible and there is no way we can visualize it *and* claim we
> are WYSIWYG.
> For the "difficult" cases like "l?m? or "ksha", my idea would be to
> decide arbitrary borders within the glyph, hoping that the user will follow
> the reasoning.
> For the "impossible" cases like the invisible viramas, I would step back to
> #3 above, trying to enforce the user's perception that virama is *not* a
> character by itself. One way to suggest this is to define a keyboard where
> each "full consonant" is assigned to an plain key, and each "dead consonant"
> (consonant+virama, in the encoding) is assigned to the corresponding shifted
> key. No key should be assigned to virama itself: when someone exceptionally
> need a stand-alone virama (e.g. in a didactic text), they would enter it
> through less direct methods (e.g. using some "Insert Symbol" menu command).
> So said, I wish you all a pleasant Winter Solstice/Christmas/End-of-Ramadan
> Id. See y'all post-Y2K-bug.
> _ Marco

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT