From Unicode 3.1 (On-line)
( http://www.unicode.org/unicode/reports/tr27/index.html )
U+200D ZERO WIDTH JOINER
The intended semantic is to produce a more connected rendering of adjacent characters than would otherwise be the case, if possible.
1. If the two characters could form a ligature, but do not normally,
ZWJ requests that the ligature be used.
2. Otherwise, if either of the characters could cursively connect, but
do not normally, ZWJ requests that each of the characters take a
cursive-connection form where possible.
(bullet) In a sequence like <X, ZWJ, Y>, where a cursive form exists for X,
but not for Y, the presence of ZWJ requests a cursive form for X.
3. Otherwise, where neither a ligature nor cursive connection are available,
the ZWJ has no effect.
Starting with Unicode 3.0.1, the definitions of ZWJ and ZWNJ were expanded
to allow for greater control over ligature formation. A reason given for this
is: "In some orthographies the same letters may either ligate or not, depending
on the intended reading".
> Thus, what John Hudson is wanting to do is to have "f" + ZWJ + "i" be
> required to make the "fi" ligature by using the <rlig> feature. Any font
> that does not have OpenType support, or some other smart font
> rendering, would ignore this and not render the ligature.
Right. And any older font lacking a no-width no-contour glyph for ZWJ
would probably display a null box between the "f" and the "i".
> Another example: "a" + ZWJ + combining acute + ZWJ + "e" would be
> required to produce an "ae" ligature with the combining acute over the
> a portion of the ligature. Is this reasonable?
AFAICT, ZWJ is not appropriate for combining glyphs like the combining
acute diacritic. "a" + combining acute + ZWJ + "e" might be reasonably
expected to produce what you've described.
> Asmus is correct in needing to consider other languages. Saying that
> the ZWJ causes Arabic to ligate would not be correct. It already is
> defined to cause correct contextual shaping (isol, initial, medial, final)
> forms. In fact, LAM + ZWJ + ALEF breaks the required ligature
> formation because it sticks something in the middle of the context and
> proves what the Unicode book says, "in some systems they may break
> up ligatures by interrupting the character sequence required to form
> the ligature." Should font vendors then have to not only code the normal
> ligature formation, but also have to code shaping rules to make the ZWJ
> work as well?
Yes, if font vendors want to provide this level of support. According
to recent posts on the Unicode list, some font vendors are already doing
this because of Unicode's recommendations on the subject. (Please see
"Implementation Notes" under "Controlling Ligatures" in TR27 linked
As far as 'interrupting the sequence on some systems', the Unicode
Standard may simply be referring to older, non-compliant systems
which don't ignore these formatting characters where appropriate
and/or have not yet implemented full support for Unicode 3.0.1 and up.
So, this is already a complicated nightmare for shaping engine
implementers. Sometimes the character should be ignored, but
other times it needs to be a mandatory part of a look-up. Font
developers seeking to follow the Unicode guidelines seem to be
doing so on a 'by gosh and by golly' basis. John Hudson's proposal
offers sensible parameters along with intuitive justification.
Using 'rlig' for ZWJ based ligation is a clear choice. If an author
takes the trouble to insert a ZWJ, a ligature is required if possible.
This archive was generated by hypermail 2.1.2 : Fri Jul 19 2002 - 02:45:49 EDT