Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

From: Philippe Verdy (
Date: Sun Jul 13 2003 - 04:51:48 EDT

  • Next message: Michael Everson: "Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)"

    On Sunday, July 13, 2003 7:21 AM, John Cowan <> wrote:

    > > Check common fonts like Trebuchet MS, Berkeley Book, Goudy Sans,
    > > Korinna and Univers for recognizable _Et_ ampersands.
    > I hand-write & by making a tall lower-case epsilon glyph and then
    > drawing a solidus over it.

    All this discussion shows that there is an extremely large number of
    glyph variation for the ampersand which is both (at the abstract level)
    a symbol character, and a ligature of two lowercase abstract
    characters. But ligatures for the uppercase "ET" and titlecase "Et"
    do exist as well. For Unicode, only the abstract symbol is encoded,
    but not the ligatures, despite they share a common set of glyphs.

    There are many other ligatures in the Latin script, extending up to
    the handwritten script which is mostly made of ligatures for complete
    words. Printing and publishing have just reduced since Gutenberg
    their numbers as a way to simplify the reproduction of the
    handwritten script using the old Classic forms from Roman Latin
    and Classic Greek, where characters were rarely ligated.

    In today's scripts, Arabic and Indic scripts have kept most of their
    ligatures in the printed form, where it was probably considered to
    preserve not only the semantic of the published text, but also its
    artistical graphic form.

    I won't then promote the encoding of ligatures for "et", "ET", "Et",
    as this would be endless. We already have the necessary tools
    in Unicode to represent in the abstract text where we need a
    differenciation between the symbol use, and the ligature use
    with ZWJ (if really needed), or with font renderer features where
    these ligatures are automatically selected from available fonts.

    However the choice of the ligatures to use for the letters "et" is
    quite large, and I wonder how a font renderer would be able to
    choose between them, without encoding somewhere in the
    abstract text some styling information:

    Could the variant selectors may be used ? I see that Unicode
    does not allow a free use of variant selectors, which are defined
    only for cases where it would be important to preserve the
    precise semantic of the encoded text, but not as a way to
    preserve the glyphic information (so character variants are
    strictly limited).

    I don't see a solution for this "problem" within Unicode itself
    (and neither in ISO/IEC 10646), unless a separate standard
    is started to encode glyphs mapped to characters
    (in the UCS-4 space, out of its 17 first planes?). For now the
    safest way is to use specific fonts encoding these glyphs
    in PUA positions, and bind these fonts to the abstract text
    using stylesheets, meta information, or markup languages.
    But with such technic, the abstract text would be modified.

    A way to avoid it is to surround the text with markup that
    specifies an explicicit substitution, like this in XML:

        <typo as="&#xF001;">et</typo>,

    One can usea better technic with stylesheets, if multiple
    fonts are used and the glyph substitutions must be made
    under the control of a stylesheet, like this in XHTML:

        <typo class="tallepsilon-tau">et</typo>

    or by using separately-defined SGML named entities:


    (this way, we don't need to specify the PUA codepoint
    value which may vary across available fonts)

    Spams non tolérés: tout message non sollicité sera
    rapporté à vos fournisseurs de services Internet.

    This archive was generated by hypermail 2.1.5 : Sun Jul 13 2003 - 05:29:50 EDT