L2/00-025 From: Mark Davis Date: 2000-01-25 Re: Extending ZWJ and ZWNJ for Ligation The following is a proposal to satisfy the desire of some to have more control over ligation with Unicode. It does this by extending the definition of ZWJ and ZWNJ. Currently, the description of ZWJ and ZWNJ specifically excludes ligation. That is, f[zwj]i is not to create a ligature, but only to join the characters cursively (if the font allows). Thus [meem][zwj][jeem] will cursively connect, and the [zwj] would not have any effect on ligation. We are somewhat inconsistent with this in the Indic world, since conjuncts are really more akin to ligatures than they are to cursive connection. The reason we kept them separate was for Arabic, where one could want cursive connection without a ligature. However, a broadening can be done without materially affecting the usage in Arabic. Here are the definitions I propose. ZWJ - if possible, produce a more connected rendering of adjacent characters than would otherwise be the case. In particular: 1. If two characters could form a ligature, but do not normally, ZWJ requests that the ligature be used. (If there is no ligature, and the characters normally would normally cursively connect, ZWJ has no effect.) f[zwj]i => fi-ligature [meem][zwj][jeem] => [meem-jeem-ligature] [meem][zwj][jeem] => [meem-initial][jeem-final] // if there is no ligature 2. If two characters could cursively connect, but do not normally, ZWJ requests that they cursively connect. In particular, if a character A on one side has a cursive form, and the other character B does not, ZWJ requests that A take a cursive form. [space][zwj][jeem] => [space][jeem-final] ZWNJ - if possible, break both cursive connections and ligatures. Examples: f[zwj]i => f i [lam][zwnj][alef] => [lam-isolated][alef-isolated] [meem][zwj][jeem] => [meem-isolated][jeem-isolated] In other words, given three broad categories: 2: ligated 1: cursively connected 0: unconnected ZWJ requests that glyphs in the highest available category be used; ZWNJ requests that glyphs in the lowest available category be used. For those unusual circumstances where someone wants to forbid ligatures in a sequence XY, but promote cursive connection, the sequence X[zwj][zwnj][zwj]Y will work. The [zwnj] breaks ligatures, while the two adjacent joiners cause the X and Y to take adjacent cursive forms where they exist. Similarly, if someone wanted to have X take a cursive form but Y be isolated, then the sequence X[zwj][zwnj]Y could be used (as currently). Implementation: For modern font technologies, such as OpenType or AAT, font vendors should add as appropriate. for will probably have the desired effect naturally, without any change, since it is doubtful that non-Arabic fonts deal with it at all, except to map it to an invisible glyph. Current Arabic shaping algorithms should work as they are; optional ligatures just would not be promoted by ZWJ, but current text should not be affected. The reason is that the current use of ZWJ between characters that normally cursively connect is redundant right now (as a matter of fact, with bad implementations of ZWJ or unsupported ZWJ, the cursive connection would actually be broken), and should occur in very few instances. If this proposal is accepted, then the additional semantic of promoting ligatures can be added to Arabic implementations over time. The worst that would happen is that a current redundant usage of ZWJ would cause an optional ligature to form. Pros - The characters exist right now, and could be applied immediately, without waiting for the addition of a new character. - The behavior is more consistent across all scripts, including Indic. - The current use in Arabic would not be substantially affected. - The number of format characters, which require special implementation handling, is not increased.