Re: [OpenType] Proposal: Ligatures w/ ZWJ in OpenType

From: Eric Muller (
Date: Fri Jul 12 2002 - 20:38:35 EDT

The mechanism proposed by John to handle ZWJ/ZWNJ makes the implicit
assumption that those characters are transformed into glyphs (via the
usual 'cmap' mechanism) and that this is the avenue to transfer the
intent of those characters to the shaping code in the font (i.e. some
kind of ligature lookup). I'd like to revisit that assumption.

The ZWJ/ZWNJ characters are formatting characters. Their function is
definitely different from the function of the "regular" characters (such
as "A"): they are a way to control the rendering of regular characters
around them, and to express that control in plain text. The debate so
far shows that there is no strong objection to that mechanism by itself.

In an environment richer than plain text, there is obviously the
possibility that this control could be expressed by other means than
characters. In the OpenType world, and in particular in the interface
between the layout engine and the shaping code in fonts, we have more
than plain text, or rather plain glyphs; we also have a description of
which features should be applied to which glyphs. So instead of having
glyphs that stand for ZWJ/ZWNJ, can we use these features?

In fact, we already do that every day. For example, an InDesign user can
insert the two characters x and y, and apply a ligature feature (let's
say 'dlig') to them. It seems to me that this is just what ZWJ is about.
So InDesign could do the following given the character sequence x ZWJ y:
map it the glyph sequence cmap(x) cmap(y), with 'dlig' applied on those
two glyphs. This 'dlig' application takes precedence over one via UI,
i.e. it happens regardles of whether the user requested 'dlig'
explicitly. The ZWJ character is simply not mapped to the glyph stream,
since the feature application does the job of ZWJ.

We can handle ZWNJ in the same way: the sequence x ZWNJ y is transformed
to the glyph sequence cmap(x) cmap(y), with 'dlig' not applied on those
two glyphs. This 'dlig' non-application takes precedence over one via
UI, i.e. 'dlig' is not applied to these two glyphs regardless of whether
the user requested 'dlig' explicitly.

[May be a better way of thinking about the precedence stuff is to think
entirely in markup terms:
<ligatures-on> ... x ZWNJ y ... </ligatures-on> is transformed in the
glyph stream <dlig> ... cmap(x) </dlig> <dlig> cmap(y) ... <dlig>, i.e.
dlig is off on the pair x y; hold your objection that a feature is
applied to a position rather than a range for a minute.]

With this approach, we gain two things. First, not having a "formatting"
glyph for ZWJ is IMHO a huge conceptual win, even bigger than not having
a "formatting" character ZWJ would be. Second, what John's proposal did
not mention (or may be I missed it) is that it's not just the ligature
features that have to deal with this glyph, it is all the features;
compound that by all the formatting characters, and you will start to
understand Paul's reaction.

It's interesting to note that this approach can be applied to other
formatting characters as well. Either their intent can be achieved by
the layout engine alone, without help of the font, in which case there
is no need to show anything to the code in the font; no glyph and no
feature are consequence of those characters. Or their intent needs help
of the font, and the OpenType way to ask for this help is to apply (or
not) features.

All that takes care of selecting a ligature, but it does not quite take
care of selecting cursive forms. I can see how we could define 'dlig' to
do that (or define a 'zwj' feature that invokes the ligature lookups
plus some single substitution lookup), but I am not sure I am happy with
that. In fact, I am not sure I am happy with that clause in Unicode.


[About the features applied to ranges rather than positions: think about
it and it should be obvious 8-) It does not make sense to apply a
ligature at a position; what makes sense is to apply a ligature on
range. Think about 1->n substitutions; whatever lookups apply to the
source glyph should also apply to all the replacement glyphs - ranges
again. I even believe that this approach is compatible with the current
OpenType spec. More details on demand.]

This archive was generated by hypermail 2.1.2 : Fri Jul 12 2002 - 19:00:23 EDT