L2/01-049 Sent: Tuesday, January 23, 2001 2:16 PM Subject: comments on L2/01-042 (SC2/WG2 N2317) Peter_Constable@sil.org on 01/23/2001 08:48:46 AM Please respond to unicore@unicode.org I see that this paper by Michael is on the agenda for the upcoming meeting. Without invalidating the general issues that Michael has raised, I think some comments on some of the specifics may be worth considering: First, in relation to Tom Milo's problem -- creating a font with default ligatures but having a way to break them to get "typewriter" style Arabic: Michael considers the broadened semantics for ZWJ / ZWNJ adopted in TUS3.0.1 and concludes that Tom's "engine should automatically insert ZWJs to cause the ligatures". That's a possibility, but it's not the only possibility. If Tom wants the default behaviour in his OpenType font to force all ligatures, then he can specify that in the GSUB table of the font. Then, to break the ligatures a user could insert ZWNJs (or < ZWJ, ZWNJ, ZWJ>) as needed. Probably a better OpenType solution, however, would be to make the "typewriter" style the default, and to use discretionary ligatures (the dlig feature tag in OT). I believe that the way in which this feature tag is intended to be used in OT is for an app to provide a UI that allows a user to enable (or disable) this feature over a run of text. Thus, in an appropriate app, selecting between fully ligated and "typewriter" can be a matter of changing text styles. While the default mode of the font is the typewriter style, a user should be able to specify defaults in whatever stylesheets they use that turn on the discretionary ligatures for this font. Certainly in AAT and in Graphite, this type of capability is possible by designing a font to include ligation levels. Again, in an appropriate app, a UI would be provided that allows the user to select the ligation level over a run of text. Secondly, in one example Michael suggested that IPA contour tone diacritics could be encoded as sequences involving ZWJ, as in < combining caron, ZWJ, combining grave > to yield a rising falling tone. I think this proposed encoding has serious problems. First, this is not a ligature but a single graphemic element: in a ligature, there are multiple semantic elements presented with a single glyph. Here, we'd want multiple characters to represent a single semantic unit, rising-falling tone. Thus, < caron, CGJ, grave > is arguably better. That has a failing, though, in terms of how CGJ gets interpreted in terms of visual control. (Michael covered the questions I have raised before regarding the mixing of visual and semantic control that has been proposed for CGJ.) Secondly, there would be no reason to prefer < caron, ZWJ, grave > over < grave, ZWJ, circumflex > or < grave, ZWJ, acute, ZWJ, grave > (you can substitute CGJ for ZWJ here if you prefer) with the result that we would have multiple canonically equivalent representations, and a need to define a canonical decomposition from caron to < grave, ZWJ, acute > and from circumflex to < acute, ZWJ, grave >. Obviously, nobody wants to go there. I think the only reasonable treatment of the IPA contour tone diacritics is to propose new diacritics. Thirdly, I want to repeat a comment I made on PDUTR27. In relation to the original semantics of ZWJ / ZWNJ on Indic, Michael quoted Mark Davis' comments, "We are somewhat inconsistent with [the original ZWJ / ZWNJ semantics] in the Indic world since conjuncts are really more akin to ligatures than the are to cursive connection," to which Michael responded, "I don't agree with this last point; Indic scripts... operate under a different shaping model." Later, in relation to the broadened semantics for ZWJ / ZWNJ, Michael quoted Mark's comments, "The behaviour [of the extended ZWJ / ZWNJ semantics] is more consistent across all scripts, including Indic," and responded, "I don't see how Indic is affected." On the original semantics, I think both Mark and Michael were partially correct: Perhaps half forms are somewhat like cursive connections, but certainly full conjuncts are indeed more like ligatures than cursive connections. Indic indeed used a different model, but it perhaps had an element of consistency with the use of ZWJ / ZWNJ in Arabic: just as in Arabic ZWJ forces a cursive connection and ZWNJ prevents it, so in Devanagari ZWJ forces a half form (arguably akin to a cursive connection), while ZWNJ prevents a half form (or a conjunct). On the new semantics, Michael is correct that Indic is not affected if we assume different models. Mark was trying to suggest, I think, that the broadened semantics provide a consistent model for all scripts (certainly the text that appears in PDUTR27 sounds this way): given 2: ligated 1: cursively connected 0: unconnected ZWJ (under the broadened semantics) request the highest category, and ZWNJ the lowest. For Indic, ZWNJ works this way, but ZWJ selects 1 over 2: as Mark observed, full conjuncts are more like ligatures than cursive connections, but for Indic ZWJ explicitly prevents a full conjunct, choosing instead something that might be conceived as being more akin to a cursive connection -- a half form. - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: