Re: Proposal: Ligatures w/ ZWJ in OpenType

From: James Kass (jameskass@worldnet.att.net)
Date: Thu Jul 11 2002 - 01:25:37 EDT


John Hudson wrote,

> On the whole, Paul, I share your concerns about the creeping advance of
> quasi-typographic layout elements in what is ostensibly a plain text
> encoding standard. I do feel, however, that we can afford, within the
> existing OpenType Layout structure and without inventing new features, to
> provide a simple mechanism for resolving the ambiguity between ligation as
> a matter of typographic display and ligation that amounts to an aspect of
> orthography. In the recent Unicode list discussion, James Kass suggested
> that ligation was a matter of spelling, rather than of typographic display.
> I disagree, but I think there are specialist cases in which ligation is
> something closer to spelling than to the typographic display. It is those
> cases that I am seeking to address in my proposal.

Actually, we seem to agree here.

Rather than suggesting that ligation was a matter of spelling, I only
said that I considered ligation to be more akin to spelling than to
effects like italic or bold. I had special cases in mind.

There are a multitude of special cases such as paleontology, papyrology,
medievalists, religious studies, Turkish, German, and so forth. In
addition to the classic Turkish example, there are other modern
languages which should not have standard English ligation enabled by
default. Many languages use adaptations of the Latin script which also
need to use combining diacritics. For instance, the Yoruba word for
coffee is ko̩fí. If the precomposed i-acute is used, no problem, but if
the combining acute is used there may be a problem, depending on
look-up order and other factors.

Another 'exceptional' case might be multilingual documents. Suppose
that an English language file includes a quote in Turkish, or someone
makes a page with a certain phrase translated into many languages.
Whether to use mark-up or plain text mechanisms is clearly arguable,
and one of the reasons for liking John Hudson's proposal is that
options are preserved.

If a French author includes a brief German quote from Goethe in a
French web page, must the reader download a complete German
dictionary and accompanying software in order to properly display
the text?

Asmus Freytag has requested that non-English requirements be taken
into account during these discussions. With so many 'special' cases
including those listed above, it is suggested that, in the multi-
lingual world, English should be considered a special case, too.

Paul Nelson's example of newspaper columns is interesting. In last
week's local newspaper here, I saw this in practice. Frankly, the use
of the ligature inside of a word which was e x p a n d e d looked like
a mistake. In this English language newspaper, any automatic ligation
should have been broken on that line by the author or proofreader
with ZWNJ. (Assuming this newspaper uses modern methods...)
As John Hudson points out, ZWNJ should break automatic ligation
without needing any software modification.

Paul mentions the 'hit' upon existing implementations, but existing
software which doesn't ignore ZWJ/ZWNJ in search operations, &c.,
is already wrong regardless of John's proposal and needs to be fixed.

Quoting from Unicode 2.0, page 6-71: "ZERO WIDTH NON-JOINER or
ZERO WIDTH JOINER are format control characters. As with other
such characters, they should be ignored by processes that analyze
text content. ..." On that same page are examples showing the
formation of the "fi" ligature with ZWJ to make the word "fish".

Here are some examples for the word "fist":

fist (straightforward)
fist (frowned-upon presentation forms)
f‍is‍t (Two ZWJs)
FIST (All Caps)

IMO, a search for the string 'fist' should find all four examples. Of
course, the presentation forms are debatable. They're included here
because equivalences exist, though.

As John Hudson noted, it might be better to have the 'search' user
be able to toggle inclusion of ZWJ/ZWNJ. I wholeheartedly agree
with this, and suggest that it is similar to the user being able to
'match case'. Still, this will ultimately be left up to the designers
of text processing software and there should be enough differences
of opinion so that everyone can choose software which meets their
requirements.

Earlier in this (or a related) thread, I used the example of the
Encyclopædia Britannica. This encyclopedia (the 1960 version) has
a section called "Encyclopaedia". Throughout this section, the "ae"
ligature isn't used until the sub-section on Encyclopædia Britannica,
of course. Personal preference, good typography, spelling, style,
German, English, Latin, or Devanagari, I think that the ability to
make these distinctions is important at the plain text level.

Best regards,

James Kass.



This archive was generated by hypermail 2.1.2 : Wed Jul 10 2002 - 23:30:37 EDT