Further thoughts on ZWJ ligation

From: John Hudson (tiro@tiro.com)
Date: Fri Jul 19 2002 - 13:52:13 EDT

Having digested all the discussion on two lists about about implementing
ligation using ZWJ, I am beginning to think that it was a mistake for the
UTC to have overloaded the existing ZWJ character with new ligation
behaviour in response to Michael Everson's ZWL proposal. Indeed, I now
think it would have been a better idea to accept Michael's proposal for a
new, distinct ligating character.

The reason for this is that the existing ZWJ behaviour was already
implemented in a number of layout engines, e.g. in Microsoft's Indic
engine, and the existing behaviour encourages use of ZWJ as a *non-painted*
control character, along with other text control characters like ZWNJ and
the direction marks RTL and LTR. A character that is not painted as a glyph
cannot be used in glyph-level substitutions of the kind recommended for ZWJ
implementation in TR27. This, to my mind, is the biggest stumbling block of
the whole ZWJ ligation issue -- not whether it is best to handle the
substitutions in the <rlig> feature or elsewhere, or whether it is best to
handle ZWJ sequences directly in font lookups or indirectly by having the
line layout engine interpret e.g. the character sequence f+ZWJ+i as glyph
sequence f+i with a ligature feature applied -- but the problem posed by
existing implementations of ZWJ as a control character whose use may or may
not be reconcilable with the new ligating behaviour.

If the UTC had accepted Michael's original proposal for a ZWL character,
distinct from ZWJ, I think we would be in a better position now, because we
would have been able to develop a clean implementation model for this new
character, without having to worry about how to change existing ZWJ
implementations to make new behaviour possible.

Obviously, there was a long period in the development of Unicode, when
implementation issues were largely theoretical, because no one had
developed the smart font formats or Unicode-based layout engines yet. This
is no longer the case, and decisions to add new behaviour to existing
characters have a direct impact on existing implementations.

In all seriousness: what are the chances of the ZWJ ligation decision being
reviewed by the UTC and Michael's original ZWL proposal revisited?

John Hudson

Tiro Typeworks www.tiro.com
Vancouver, BC tiro@tiro.com

Language must belong to the Other -- to my linguistic community
as a whole -- before it can belong to me, so that the self comes to its
unique articulation in a medium which is always at some level
indifferent to it. - Terry Eagleton

This archive was generated by hypermail 2.1.2 : Fri Jul 19 2002 - 11:53:40 EDT