Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

From: John H. Jenkins (
Date: Mon Jul 01 2002 - 16:57:51 EDT

On Monday, July 1, 2002, at 02:08 PM, Asmus Freytag wrote:

> At 11:34 AM 6/30/02 -0600, John H. Jenkins wrote:
>> Remember, Unicode is aiming at encoding *plain text*. For the bulk of
>> Latin-based languages, ligation control is simply not a matter of *plain
>> text*—that is, the message is still perfectly correct whether ligatures
>> are on or off. There are some exceptional cases. The ZWJ/ZWNJ is
>> available for such exceptional cases.
> Remember also that the simplistic model you present already breaks down
> for German, since the same character pair may or may not allow ligation
> depending on the content and meaning of the text - features that in the
> Unicode model are relegated to *plain* text.

*sigh* I'm clearly not expressing myself well here.

I'm trying to state the general rule. Each time I do, I say there are
exceptions. German is an excellent example of an exception. Michael's
exceptional cases are exceptional cases. We put ZWJ/ZWNJ in charge of
plain-text ligature formation to handle these cases. I'm fine with that.

Turkish is another exception, BTW, where the typical "fi" ligature of
Latin typography should not be formed.

The issue -- as I see it -- is not whether or not *any* ligature control
belongs in plain text, or whether or not manditory/prohibited ligation
points should be marked in plain text. I'm not aware of anyone who is
arguing against that position.

We started out with a discussion of whether or not we should add more
Latin ligatures (whether in the PUA or elsewhere) so that people can, in
essence, create a plain-text representation of an older book where such
were more common. (And, as always, if my memory is inaccurate please feel
free to correct me here.) This is not an appropriate use of plain text
IMHO. I do not believe, moreover, that the ZWJ/ZWNJ mechanism is
appropriate for this sort of thing. This is rich text, and other ligation
controls should be used.

> Therefore, I would be much happier if the discussion of the 'standard'
> case wasn't as anglo-centric and allowed more directly for the fact that
> while fonts are in control of what ligatures are provided, layout engines
> may be in control of what and how many optional ligatures to use, the
> text (!) must be in control of where ligatures are mandatory or
> prohibited.

Which is what Unicode 3.2 says. (You said it very nicely here, though.)

(The standard case, BTW, seems to be Anglo-centric largely because this is
an English-speaking list and people always seem to start out with the "ct"
  ligature they'd like to put in words like "respectfully." Sorry about

John H. Jenkins

This archive was generated by hypermail 2.1.2 : Mon Jul 01 2002 - 15:07:33 EDT