RE: [OpenType] Proposal: Ligatures w/ ZWJ in OpenType

From: John Hudson (
Date: Thu Jul 11 2002 - 13:07:50 EDT

At 05:09 AM 11-07-02, Paul Nelson wrote:

>Thus, what John Hudson is wanting to do is to have "f" + ZWJ + "i" be
>required to make the "fi" ligature by using the <rlig> feature. Any font
>that does not have OpenType support, or some other smart font rendering,
>would ignore this and not render the ligature.


>Another example: "a" + ZWJ + combining acute + ZWJ + "e" would be required
>to produce an "ae" ligature with the combining acute over the a portion of
>the ligature. Is this reasonable?

It would be reasonable if the font contained such a ligature form. The
equivalent sequence

         a + combining acute + ZWJ + e

would probably need to be catered for. I think the larger issue in this
example is not about whether the diacritic ligature can be formed, but what
happens if the ligature cannot be formed? If the desired final rendering,
in the absence of a ligature is

         a with acute + e

it will be necessary for the text engine to ignore the ZWJ between the 'a'
and the combining mark in order to be able to correctly position the

>Asmus is correct in needing to consider other languages. Saying that the
>ZWJ causes Arabic to ligate would not be correct.

I'm deliberately trying not to consider *any* specific languages, but to
define a general proposal for rendering ZWJ sequences *as* ligatures when
appropriate ligatures are available in a font, regardless of language. I am
not suggesting that users should or should not be using ZWJ in any
documents: frankly I hope most people will avoid them as much as possible.
But if an author includes a ZWJ character in a document, I presume it is
because they want to see characters ligated. We're making fonts with
ligatures in them, so it seems silly not to make such ligatures responsive
to explicit requests for ligature formation.

>It already is defined to cause correct contextual shaping (isol, initial,
>medial, final) forms. In fact, LAM + ZWJ + ALEF breaks the required
>ligature formation because it sticks something in the middle of the
>context and proves what the Unicode book says, "in some systems they may
>break up ligatures by interrupting the character sequence required to form
>the ligature." Should font vendors then have to not only code the normal
>ligature formation, but also have to code shaping rules to make the ZWJ
>work as well?

It is up to individual font developers and text processing engineers to
decide how much tolerance they want to extend to users who code text in
unexpected or incorrect ways. What would Uniscribe do if it encountered the

         lam + ZWJ + alef

Would the ligature break? Do you care? Would it be possible for Uniscribe
to completely ignore the ZWJ character and form the ligature anyway? Would
it be possible for an individual font to cover this sequence in the <rlig>
feature lookups?

I'm not suggesting that anyone *should* be utilising my proposal for
handling ZWJ sequences in the <rlig> feature, only that this is a sensible
way to handle ZWJ sequences if one is inclined to do so.

>My opinion is that the definition of the ZWJ and ZWNJ characters need to
>be kept generic and not defined for use based on a particular script. As
>such, it seems to me that changing the ZWJ to have a meaning to force a
>ligature as normative behavior would be a bad thing to do. There might be
>some valid arguments to make this informative on a script by script basis.
>However, driving a standard based on exceptions is not a good habbit to
>get into.
>I believe that the curent text in the Unicode Standard 3.0 book allows
>John Hudson to make lookups in his fonts for the required ligature
>feature that will cause the ligature to be formed if the ZWJ is used. John
>would simply need to communicate this to his font users so they know his
>So, is there a problem if we leave the Unicode Standard's definition as it
>is currently written?

I don't recall making any suggestion to change the Unicode Standard's
definition. My proposal concerns an implementation of that definition that
can be used to provide the user with the ligation that the presence of the
ZWJ explicitly requests as a desirable display of a sequence of characters.
Having a ZWJ character that is simply always and everywhere ignored seems
perverse: why bother having the character at all if it is invisible and is
never expected to do anything? If it is expected to do something, I have
provided a proposal for how this can be achieved in OpenType without
getting in the way of other ligature layout features.

As Paul says, there is nothing to stop me from using this convention in my
own fonts and communicating this to my customers. I've made the proposal
public because other vendors might like to do likewise, and because I think
there are too many problems associated with any other approach I can think
of. I hope the proposal will convince font developers of the need to keep
ZWJ sequence ligation distinct from standard or discretionary ligation. It
would, however, be most useful if this approach was supported in text
engines by the general application of the <rlig> feature for any script.

John Hudson

Tiro Typeworks
Vancouver, BC

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton

This archive was generated by hypermail 2.1.2 : Thu Jul 11 2002 - 15:00:02 EDT