Fraktur, ligatures, ZWJ and ZWNJ (was: Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts)

From: Karl Pentzlin (
Date: Fri Jan 26 2007 - 07:57:44 CST

  • Next message: Doug Ewell: "Re: Fraktur, ligatures, ZWJ and ZWNJ (was: Re: ZWJ, ZWNJ and VS in Latin and other Greek-derived scripts)"

    Am Freitag, 26. Januar 2007 um 02:05 schrieb John H. Jenkins:

    JHJ> In any event, I reiterate: Ligature formation in Latin is a matter of
    JHJ> stylistic preference.

    Not in any event. For typesetting German Fraktur, ligature formation
    is a matter of orthographic rules.
    In this context, the German term "Ligatur" has two closely related but
    different meanings (which nevertheless translate both as "ligature"
    into English) :
    1.) A closed and orthographically exactly defined set of types showing
        two or three letters visually linked in a standardized way. The use of
        this ligature types is defined by orthographic rules. Thus, the whole
        ligature is subject of the orthography.
        I call this "O-Ligature" within the remainder of this text.
    2.) The usual meaning: compounds of two or more characters, which are
        preferred over the single characters for esthetical reasons, while
        the single components stay being the subject of the orthography.
        I call this "E-ligature" within the remainder of this text.

    Simplified, in Fraktur ligatures (in the first sense) are to be used
    within syllabes but not at syllabe boundaries.
    E.g., when typesetting "finden" (to find), using a "fi" O-ligature is
    required, i.e. a type with a dotless i under the f bow linked with
    the f bar (which is a reference glyph and not a variant chosen for
    esthetical reasons).
    When typesetting "Schilfinsel" (island full of reed), using a "fi"
    O-ligature is an orthographic error. This does not prevent the font
    designer to create an E-Ligature which contains a dotted i not linked
    with the f bar, but is looking better than a sequence of unrelated
    f and i.

    Now, how to encode Fraktur in Unicode plain text?
    a.) The presentation forms U+FB00...U+FB05 are to be avoided (as
        usual); they anyway are only a true subset of the O-ligatures
        required for correct Fraktur typesettung, e.g. lacking "tz".
    b.) No semantical analysis shall be required to determine the
        correct orthographic types.
    c.) The representation of the text in non-Fraktur fonts shall
        be as interferenced as little as possible.

    Premise b. rules out the possibility "let the presentation software
    determine whether an O-ligature is appropriate to present a given
    sequence of base characters".

    Two possibilities:
    1.) Require ZWJ to mark O-ligatures where required.
    2.) Assume an O-ligature whenever a sequence of the base letter
        occurs, and require ZWNJ where the O-ligature would be erroneous.

    In my eyes, the possibility 2. is to be strongly preferred. It does
    not interfere with texts where the difference of O-ligatures and
    E-ligatures are irrelevant, and does not affect the known semantics
    of ZWJ/ZWNJ in any way. Thus, it fulfills premise c.
    Moreover, it is easier to type in, as the cases where ZWNJ is to be
    typed in 2. are rarer as for the ZWJs in 1., and they are more likely
    to be seen as exceptions which are to be cared rather than the normal
    cases of character sequences where O-ligatures are appropriate.

    Thus, a well designed OpenType Fraktur font could have a "fi" glyph
    appropriate as O-ligature for the sequence "f"+"i". This works
    also for non-German texts where this glyph is expected in any cases.
    It could have an E-ligature showing an unconnected but specially
    designed "fi" glyph for the sequence "f"+ZWNJ+"i".

    I know of no existing Fraktur font using either of the two mechanisms.
    Even the fonts which have the required O-ligatures anyway (and thus
    are not only pseudo-Fraktur fonts) have them as single
    characters on arbitraty code points (that is, there seems to be no
    truly Unicode compatible Fraktur fonts unless you admit PUA use).

    Besides the ligature problem, premise c. could be fulfilled for
    Fraktur texts if there were variant selectors for U+0073 s,
    determining "round s in any case" or "use long form when using
    a broken letter font only; use round form in all other cases",
    leaving U+017F "long s" as a completely different letter which
    inherently has the long form in any cases.

    To standardize Fraktur handling in Unicode, maybe something like
    an UTR would be appropriate.

    - Karl Pentzlin

    This archive was generated by hypermail 2.1.5 : Fri Jan 26 2007 - 07:59:12 CST