Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

From: James Kass (
Date: Mon Jul 01 2002 - 08:28:34 EDT

John H. Jenkins wrote:

> That seems pretty clear to me. If you want a "ct" ligature in your
> document because you think it "looks cool," then you use some higher-level
> protocol. The "looks cool" factor simply doesn't apply unless you know
> what font you're dealing with, because "ct" "looks cool" in some fonts,
> but not others.

It's enough that an author would want a "ct" ligature to appear in text,
the motivation for the desire isn't relevant. Authors who want to
specify a certain ligature know about font selection.

One problem with TR28 is that it is worded so that it appears to
be "in addition" to earlier guidelines. This implies that the examples
used in TR27, for one, are still valid. In TR27, font developers are
urged to add things like "f+ZWJ+i" to existing tables where "f+i"
is already present.

Another problem with TR28 is that its date is earlier than the date
on TR27. This suggests that TR27 is more current.

Another issue is that a search of the Unicode site for "controlling
ligatures" gives TR27 as a hit, but not TR28.

Having slept on this, I concur that it might be "cool" to be able to
turn on or turn off ligatures over a range of text or an entire file
using a higher level protocol. However, options should be preserved
for the user. Ligature selection is a task for the author/typesetter
at the fundamental level; it should not be completely left to the
rendering system.

> The programs that provide ligature control do so by means of having the
> user select a range of text and then changing the level of ligation. The
> type formats like OpenType or AAT support this by allowing the type
> designer to categorize ligatures as "common," "rare," "required," and so
> on. Thus, if I'm typesetting a document in Adobe InDesign, I'll select
> text, and turn "rare" ligatures on and thus see the "ct" ligature, if it
> exists in the font and if the type designer has designated it a "rare"
> ligature.

That's a lot of ifs and it leaves too much to chance. When an author
determines that, for instance, a "ct" ligature is required, there needs
to be a method to encode it which is unambiguous. ZWJ fits the bill.

> To be frank, turning on an optional "ct" ligature throughout a document by
> means of inserting ZWJ everywhere you want it to take place makes as much
> sense in that model—the model that Western typography uses for languages
> such as English—as having the user insert a <i></i> pair around every
> letter they want in italics.

Not at all. This is apples and oranges. The italic tags operate upon
every character in the enclosed string equally. Using a similar ligature
tag would be expected to make ligatures wherever possible within the
enclosed string according the the user system's ability to render
ligatures... irrespective of the author's intent. Depending upon the
system, the same run of text could be expressed with no ligatures
at all in a monospaced font or as scripto continuo in a handwriting

Furthermore, ZWJ doesn't require proprietary software or proprietary
rich text formats which are often not exchangeable.

> Remember, Unicode is aiming at encoding *plain text*. For the bulk of
> Latin-based languages, ligation control is simply not a matter of *plain
> text*—that is, the message is still perfectly correct whether ligatures
> are on or off. There are some exceptional cases. The ZWJ/ZWNJ is
> available for such exceptional cases.

Three cheers for plain text! But, we disagree about 'perfectly correct'.
If an author is reproducing an older document in which the "ct"
ligature is used, rendering the "ct" string rather than the ligature
is not faithful to the source. (Even though it might be semantically
equivalent—it is merely approximately correct...)

How about "Encyclopædia Britannica"? That's plain text enough.
It's the title of a book; it isn't italic, bold, blue, or green. To cite
from "Encyclopedia" or "Encyclopaedia" would be correct, but not
perfectly so.

Unicode provides the long "s" form, which is arguably a presentation
form. Users have the option of directly encoding the long s form
where it is either appropriate or desired. Trusting something like
long-s-substitution to a higher protocol is not desirable because of
exceptional cases like "Malmesbury" in which the final "s" is used
medially. Fortunately, since the long s is a Unicode character, no
one has to resort to higher protocols. Likewise for the "oe" ligature
and other Latin ligatures which are directly covered by Unicode.

"Onomatopoeia" and "Onomatopœia" are the same in one sense, much
like "font" and "fount". Yet both pairs are also different. Unicoders
have the option of specifying the "oe" ligature in plain text at the
fundamental level. It is suggested that the Standard be consistent
with regard to Latin ligatures in this respect and preserve the use
of ZWJ for this purpose.

Best regards,

James Kass.

This archive was generated by hypermail 2.1.2 : Mon Jul 01 2002 - 06:21:15 EDT