Re: ZWJ and Latin Ligatures (was Re: (long) Re: Chromatic font research)

From: John H. Jenkins (
Date: Mon Jul 01 2002 - 12:37:51 EDT

On Monday, July 1, 2002, at 06:28 AM, James Kass wrote:

> John H. Jenkins wrote:
>> That seems pretty clear to me. If you want a "ct" ligature in your
>> document because you think it "looks cool," then you use some
>> higher-level
>> protocol. The "looks cool" factor simply doesn't apply unless you know
>> what font you're dealing with, because "ct" "looks cool" in some fonts,
>> but not others.
> It's enough that an author would want a "ct" ligature to appear in text,
> the motivation for the desire isn't relevant. Authors who want to
> specify a certain ligature know about font selection.

Au contraire, because of the italic analog. I may *want* a particular
word to be in italics, but that doesn't mean that the italics belong in
plain text.

It is not the goal of Unicode to allow the complete representation of an
author's intent in plain text. I can't typeset "Alice in Wonderland" in
plain text. I'm sorry, but the Mouse's tail would simply get in the way.

There's another level of problem here, too. What if it isn't the author's
intent, but an artifact of the particular typesetter?

> One problem with TR28 is that it is worded so that it appears to
> be "in addition" to earlier guidelines. This implies that the examples
> used in TR27, for one, are still valid. In TR27, font developers are
> urged to add things like "f+ZWJ+i" to existing tables where "f+i"
> is already present.

And for the record, Apple is doing that.

> Another problem with TR28 is that its date is earlier than the date
> on TR27. This suggests that TR27 is more current.

This may be a point for clarification in TR28.

> Another issue is that a search of the Unicode site for "controlling
> ligatures" gives TR27 as a hit, but not TR28.
> Having slept on this, I concur that it might be "cool" to be able to
> turn on or turn off ligatures over a range of text or an entire file
> using a higher level protocol. However, options should be preserved
> for the user. Ligature selection is a task for the author/typesetter
> at the fundamental level; it should not be completely left to the
> rendering system.

Er, James. I've never said it should. The rendering system should have
the ability to do default ligation. The user should be able to override
that behavior. That's what happens on systems I see. If they do ligation
at *all*, they have a default behavior which can be overridden.

>> The programs that provide ligature control do so by means of having the
>> user select a range of text and then changing the level of ligation. The
>> type formats like OpenType or AAT support this by allowing the type
>> designer to categorize ligatures as "common," "rare," "required," and so
>> on. Thus, if I'm typesetting a document in Adobe InDesign, I'll select
>> text, and turn "rare" ligatures on and thus see the "ct" ligature, if it
>> exists in the font and if the type designer has designated it a "rare"
>> ligature.
> That's a lot of ifs and it leaves too much to chance. When an author
> determines that, for instance, a "ct" ligature is required, there needs
> to be a method to encode it which is unambiguous. ZWJ fits the bill.

I'll repeat a point that I've made over and over and over.

The "ct" ligature does not exist in and of itself. It is a part of a
typeface. It doesn't make sense in general to ask for the formation of a
"ct" ligature without any reference to the typeface you're using.

The implication of what you're saying is that Latin typefaces should be
*required* to have a "ct" ligature on the off chance that the author of
text determines that it's "required" in a particular context. That gives
most type designers the heebie jeebies. It's bad enough that Adobe and
Apple are making them stick useless "fi" and "fl" ligatures in their fonts.

In any event, if an author determines that a "ct" ligature is honestly and
absolutely *required* in a particular context (as opposed to being
desirable), then the ZWJ mechanism exists.

>> To be frank, turning on an optional "ct" ligature throughout a document
>> by
>> means of inserting ZWJ everywhere you want it to take place makes as much
>> sense in that model—the model that Western typography uses for languages
>> such as English—as having the user insert a <i></i> pair around every
>> letter they want in italics.
> Not at all. This is apples and oranges. The italic tags operate upon
> every character in the enclosed string equally. Using a similar ligature
> tag would be expected to make ligatures wherever possible within the
> enclosed string according the the user system's ability to render
> ligatures... irrespective of the author's intent. Depending upon the
> system, the same run of text could be expressed with no ligatures
> at all in a monospaced font or as scripto continuo in a handwriting
> font.

Er, you've just made my point, haven't you? The typeface makes a
difference. If you're ever in a situation where the typeface of the
originator may be different from the typeface of the receiver, you've lost
the ability to say whether or not ligatures should be used in a particular
context. Or do you want a "ct" ligature in Courier?

> Furthermore, ZWJ doesn't require proprietary software or proprietary
> rich text formats which are often not exchangeable.

Then we need to beef-up the rich text formats to handle this.

>> Remember, Unicode is aiming at encoding *plain text*. For the bulk of
>> Latin-based languages, ligation control is simply not a matter of *plain
>> text*—that is, the message is still perfectly correct whether ligatures
>> are on or off. There are some exceptional cases. The ZWJ/ZWNJ is
>> available for such exceptional cases.
> Three cheers for plain text! But, we disagree about 'perfectly correct'.
> If an author is reproducing an older document in which the "ct"
> ligature is used, rendering the "ct" string rather than the ligature
> is not faithful to the source. (Even though it might be semantically
> equivalent—it is merely approximately correct...)

I'm sorry, but you've totally lost me here.

If I want to reproduce, say, my reproduction of the 1611 KJV, it's equally
incorrect to use a sans-serif typeface. Actually, technically, my
reproduction is already doing something very naughty by this standard,
since the *real* 1611 KJV was in blackletter.

The precise reproduction of the appearance of a text is *NOT* possible in
plain text. It is *NOT* the intention of Unicode to make it possible.

> How about "Encyclopædia Britannica"? That's plain text enough.
> It's the title of a book; it isn't italic, bold, blue, or green. To cite
> from "Encyclopedia" or "Encyclopaedia" would be correct, but not
> perfectly so.

I'd say it's perfectly correct.

I pick up any of the various Shakespeare's on my shelf. Some of them use
the æ ligature and some of them don't. It's a matter of choice in English
typography. Some books I have talk about hæmoglobin, some about
haemoglobin, and some about hemoglobin. If I'm publishing a book on blood
chemistry, and the typesetter has turned all my hemoglobins into
hæmoglobin, I may question their sanity, but I won't object that they got
it "wrong." (Well, I would in the US, but not in the UK, and that's
because I realize that the "ae/æ" spelling is the preferred one over there.

> "Onomatopoeia" and "Onomatopœia" are the same in one sense, much
> like "font" and "fount". Yet both pairs are also different. Unicoders
> have the option of specifying the "oe" ligature in plain text at the
> fundamental level. It is suggested that the Standard be consistent
> with regard to Latin ligatures in this respect and preserve the use
> of ZWJ for this purpose.

I'm not saying that we should get rid of the ZWJ mechanism. I lost that
fight a long time ago. What I am saying is that in English typography and
Latin typography in general, ligature formation is a stylistic choice.
There are exceptions. The ZWJ mechanism is inappropriate where ligature
formation is a matter of stylistic preference, and appropriate where it

John H. Jenkins

This archive was generated by hypermail 2.1.2 : Mon Jul 01 2002 - 12:36:39 EDT