Proposal: Ligatures w/ ZWJ in OpenType

From: John Hudson (tiro@tiro.com)
Date: Sat Jul 06 2002 - 13:01:44 EDT


This message is cross-posted to the OpenType and Unicode lists. Feel free
to forward it to any other lists or individuals to whom it might be of
interest.

There has recently been discussion on the Unicode list of Latin ligatures
and the appropriateness or inappropriateness of using the Zero Width Joiner
(ZWJ) character to specify ligation in Latin-script documents. There is a
difference of opinion between those who believe that

a) Ligatures are classifiable and some classes of ligatures should be
active by default for normal Latin-script text (e.g. the f-ligatures that
are designed to improve the spacing of letters sequences involving that
letter). Ligatures should be activated or deactivated in line layout, via
the application of layout features (e.g. OpenType GSUB) to normal text such
that

        o f f i c e -> o ffi c e

This facilitates the typesetting of any electronic document, without
reference to authorial decision regarding the use of ligatures, according
to long-standing typographic traditions, publishing house-styles, etc.

Such classes of ligatures have been in continuous use as standard elements
of Latin-script typography for more than 500 years. They improve the
'colour' of text and while it is important for users to be able to
deactivate them, particularly when text is set at large sizes, they should
be active by default.

Other classes of ligatures are principally decorative, e.g. the historic ct
and st ligatures, and these have not been in continuous use as standard
elements of Latin-script typography. These should also be activated via
layout features, but these features should not be active by default.

The current registered OpenType Layout features provide separate mechanisms
for these different classes of ligatures: Standard Ligatures <liga> and
Discretionary Ligatures <dlig>.

and those who believe

b) The use of all ligatures is discretionary and exceptional, and should be
determined by the author of the document using the ZWJ character to signify
ligation of two or more characters, such that

        o f f i c e -> o f f i c e
but

        o f ZWJ f ZWJ i c e -> o ffi ce

It is acknowledged that there are scripts and orthographic traditions, e.g.
Runic and Old Hungarian, in which ligature use is not standard but is
encountered as a freely applied manuscript element, i.e. the same sequence
of letters may be ligated in one occurence but not in the next. Such usage
has been amply documented by Michael Everson. Latin ligatures should be
treated in the same way, and their application should be explicit in text
rather than in layout.

There are clearly documents in which the presence or absence of ligation in
specific circumstances is important to the correct display and/or
understanding of the text. Authors citing older documents, especially in
palaeographic studies, need to be able to indicate whether ligatures were
used or not used, or used inconsistently, in the original. This information
needs to travel with the electronic document, and must not be subject to
changes to layout in downstream applications.

My own opinion is that both views include valid needs. As a professional
typographer and type designer, I fully endorse the first view: some
ligature classes are standard elements of well-formed typography, as
appropriate to the typeface in use, and are not optional in 'normal' texts
(i.e. texts in which the presence or absence of ligatures is not
significant to the content). On the other hand, while I reject the notion
that all ligature use should be authorially determined, I believe that
there are many legitimate circumstances for such determination, especially
in the area of manuscript and document studies, and that there needs to be
a mechanism for fonts to provide correct shaping *independent* of existing
mechanisms for standard or discretionary ligatures in 'normal' text.

Since the use of the ZWJ character to signify ligation, implies a clear
authorial directive that ligation *must* be used to correctly represent the
sequence. I would like to propose that font developers interested in
supporting the use of ZWJ in ligation should do so in the Required
Ligatures <rlig> feature (currently used principally for obligatory Arabic
ligatures). This provides a separate mechanism for forming such ligatures,
that will not be affected by the deactivating of Standard or Discretionary
Ligature features. A set of lookups for a common Latin font might look like
this:

<rlig>
        f ZWJ f ZWJ i -> ffi
        f ZWJ f ZWJ j -> ffj
        f ZWJ f ZWJ l -> ffl
        f ZWJ f -> ff
        f ZWJ i -> fi
        f ZWJ j -> fj
        f ZWJ l -> fl
<liga>
        f f i -> ffi
        f f j -> ffj
        f f l -> ffl
        f f -> ff
        f i -> fi
        f j -> fj
        f l -> fl

Similarly, I propose the use of the <rlig> feature for any script, e.g.
Runic, in which the ZWJ is expected to be used to signify ligation.

(Note for font developers: the <rlig> lookups should precede the <liga>
lookups. There may be circumstances, e.g. using a calligraphic font with
many ligatures, in which a user may want to use the ZWJ character to assert
a preference for, e.g. an r_d ligature over an i_r ligature in the word
'bird', even though the i_r ligature may precede the r_d in the <liga>
lookups.)

Note: use of ZWJ in a document obviously cannot be made with any
expectation of an appropriate ligature being present in a font. A document
might contain the sequence 'o r ZWJ d o', but this can only be correctly
rendered in a font that contains an r_d ligature. If such a ligature is not
available, the sequence will be appear as 'ordo' to the reader, because the
ZWJ is invisible and occupies no space. This is as intended. The important
thing is that the desire for ligation travels with the text, and can be
correctly rendered when an appropriate font is used.

Implementation issues:

In order to support the use of ZWJ ligation as outlined in this proposal,
font developers need to include the ZWJ character in their fonts, and
appropriate <rlig> lookups mapping ZWJ sequences to any ligatures present
in the font. I also recommend that font developers include the Zero Width
Non-Joiner (ZWNJ) character in their fonts, and this is likely to be used
in tandem with ZWJ by authors who wish to explicitly indicate the absence
of ligation. The presence of the ZWNJ is sufficient: no layout information
is required to inhibit ligation. The addition of these characters and
<rlig> lookups to any font that already contains <liga> or <dlig> lookups
is a trivial task.

Layout engines that provide text processing for the Latin script need to
support the <rlig> feature and to apply it as they would for Arabic.

John Hudson

Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton



This archive was generated by hypermail 2.1.2 : Sat Jul 06 2002 - 11:35:57 EDT