Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Thu Jul 10 2003 - 15:34:38 EDT

  • Next message: John Cowan: "Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures"

    On Thursday, July 10, 2003 8:37 PM, Kenneth Whistler <kenw@sybase.com> wrote:

    > Peter Kirk asked:
    >
    > > > In Turkish and Azeri the sequences f - i and f - dotless i both
    > > > occur, and are fairly frequent. So it is inappropriate in these
    > > > languages to use fi ligatures in which the dot on the i is lost
    > > > or invisible, at least where the second character is a dotted i.
    > > > Has any thought been given to this issue? Is it possible to block
    > > > such ligation on a language-dependent basis?
    > >
    >
    > and Philippe Verdy responded with another question:
    >
    > > Isn't there a "Grapheme Disjoiner" format control character to
    > > force the absence of a ligature like <fi>, i.e. <f, GDJ, i>?
    >
    > The answer to Philippe's rejoinder question is no, there is not
    > a "Grapheme Disjoiner" format control character.

    I did not refer to a specific unicode character, I knew that there
    is one already dedicated, but I did not want to comment about
    this choice.

    There's no contractiction. The Grapheme Disjoiner, for you is
    ZWNJ. OK.

    And I did not want to promote any change in any legally and
    lecacy encoded text, only to suggest ways to solve the
    apparent rendering problem in Turkish, when the <f, i>
    encoded character pair may be badly rendered. For the actual
    rendering, selecting a <fi> ligature is not appropriate for
    Turkish, and in fact the canonically decomposed character
    has no linguistic ambiguity in Turkish.

    So what ever the <fi> encoded codepoint designates, it is not
    the <fi> ligature glyoh but really two characters, whose ligation
    may still be performed according to language context.

    A font that would automatically select a <fi> ligature to represent
    a sequence of <f, i> codepoints, from the fact that the <fi>
    codepoint is canonically equivalent is probably defective and not
    conforming. Such selection of ligature must be put under the
    control of the renderer with additional markup, which can in fact
    select among three ligatures in Turkish: the <fi> ligature glyph
    where the f is ligated with the dot above i (normal ligature for
    languages other than Turkish/Azeri, the <f-dotted-i> and
    <f-fotted-i> ligatures for Turkish/Azeri.

    Markup is necessary to select the appropriate glyph, or this
    can be selected by using the "Grapheme Disjoiner" (ZWNJ)
    or the "Grapheme Joiner" (ZWJ) in addition to the use of
    a <i> or <dotless-i> codepoint eventually followed by the
    <i-above> diacritic. All this enrichment of text is assumed
    to be under the control of the markup added to the original
    text which does not need to specify whever ligatures should
    or should not be used.



    This archive was generated by hypermail 2.1.5 : Thu Jul 10 2003 - 16:14:12 EDT