Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Jul 10 2003 - 16:13:22 EDT

  • Next message: James H. Cloos Jr.: "Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures"

    > > and Philippe Verdy responded with another question:
    > >
    > > > Isn't there a "Grapheme Disjoiner" format control character to
    > > > force the absence of a ligature like <fi>, i.e. <f, GDJ, i>?
    > >
    > > The answer to Philippe's rejoinder question is no, there is not
    > > a "Grapheme Disjoiner" format control character.
    >
    > I did not refer to a specific unicode character, I knew that there
    > is one already dedicated, but I did not want to comment about
    > this choice.
    >
    > There's no contractiction. The Grapheme Disjoiner, for you is
    > ZWNJ. OK.

    <ad hominem>

    Every so often, Philippe, it would be refreshing if, when someone
    points out in error in your claims about the Unicode Standard,
    that you would simply acknowledge the error and discontinue
    making the claim, instead of coming back trying to claim that
    the error was just another way of being right.

    </ad hominem>

    There is a separate character, U+034F COMBINING GRAPHEME JOINER,
    which is the "grapheme joiner", abbreviation "CGJ" in the
    standard. That character has nothing to do with ligation
    control. There has also been debate, on several occasions,
    within the UTC, regarding the advisability of encoding
    a "grapheme non-joiner", as a pair with the "grapheme joiner".
    But again, such a grapheme non-joiner -- which has *not* been
    encoded, by the way -- would have nothing to do with ligation
    control.

    So it is a disservice to the list, perpetuating confusion, to
    invent the term "Grapheme Disjoiner" and use it in a series
    of notes regarding ligation control, when the standard already
    designates the ZWJ and the ZWNJ as the relevant controls
    related to ligation control.

    So it is not that for me "the Grapheme Disjoiner is the ZWNJ";
    rather, it is for the Unicode Standard that the ZWNJ is the
    designated, standardized format control for ligation control
    of the sort you are talking about. Please learn the terminology
    and make correct use of it.

    > A font that would automatically select a <fi> ligature to represent
    > a sequence of <f, i> codepoints, from the fact that the <fi>
    > codepoint is canonically equivalent

    U+FB01 LATIN SMALL LIGATURE FI is not a *canonical* equivalent to
    <f, i>; it is *compatibility* equivalent. That is an important
    distinction.

    > is probably defective and not
    > conforming.

    Wrong. There is nothing nonconformant about fonts automatically
    ligating <f, i> (or any other sequence). Such automatic
    ligation may not always be appropriate or the desired result
    for an end user, but that has nothing to do with the conformance
    requirements of the standard.

    > Such selection of ligature must be put under the
                                 ^^^^
                                 
    Wrong. "must" --> "may"

    > control of the renderer with additional markup, which can in fact
    > select among three ligatures in Turkish: the <fi> ligature glyph
    > where the f is ligated with the dot above i (normal ligature for
    > languages other than Turkish/Azeri, the <f-dotted-i> and
    > <f-fotted-i> ligatures for Turkish/Azeri.

    It is unclear that any such ligatures are required or desireable
    for Turkish/Azeri, in any case.

    > Markup is necessary to select the appropriate glyph, or this
      ^^^^^^^^^^^^^^^^^^^
      
    Wrong. A higher-level protocol is needed, and that may involve
    markup. But the Turkish requirements can equally well be
    met by simply setting "no ligature" style settings for
    the relevant fonts.

    > can be selected by using the "Grapheme Disjoiner" (ZWNJ)
                                   ^^^^^^^^^^^^^^^^^^^^
                                   
    Wrong term. See above.

    > or the "Grapheme Joiner" (ZWJ) in addition to the use of
             ^^^^^^^^^^^^^^^^^
             
    Wrong term. See above.

    > a <i> or <dotless-i> codepoint eventually followed by the
    > <i-above> diacritic.

    And in any case, it is inadvisable to be suggesting use of
    ZWJ and ZWNJ in this way to solve the problem of assuring that
    Turkish texts don't ligate inappropriately on rendering.

    > All this enrichment of text is assumed
    > to be under the control of the markup added to the original
    > text which does not need to specify whever ligatures should
    > or should not be used.

    This last clause I agree with. But the implication that
    markup has to be added to Turkish text in order to get it
    to render correctly regarding ligature usage is incorrect.
    Adding markup to the text is "adding to the original text"
    as surely as adding ZWNJ format controls would be. In any
    case it is unnecessary, since alternatives exist which simply
    specify suppression (or use) of ligatures stylistically in
    the fonts.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Jul 10 2003 - 16:57:48 EDT