Re: Ligatures in Portuguese, French (was: ... Turkish and Azeri)

From: Doug Ewell (
Date: Sun Jul 13 2003 - 19:31:22 EDT

  • Next message: Mark Davis: "Re: ISO 639 "duplicate" codes (was: Re: Ligatures in Turkish and Azeri, was: Accented ij ligatures)"

    Philippe Verdy <verdy_p at wanadoo dot fr> wrote:

    > All this discussion shows that there is an extremely large number of
    > glyph variation for the ampersand which is both (at the abstract
    > level) a symbol character, and a ligature of two lowercase abstract
    > characters. But ligatures for the uppercase "ET" and titlecase "Et"
    > do exist as well. For Unicode, only the abstract symbol is encoded,
    > but not the ligatures, despite they share a common set of glyphs.

    That is one of the essential features of Unicode. Abstract characters
    are encoded; glyph variants (in general) are not.

    > Could the variant selectors may be used ? I see that Unicode
    > does not allow a free use of variant selectors, which are defined
    > only for cases where it would be important to preserve the
    > precise semantic of the encoded text, but not as a way to
    > preserve the glyphic information (so character variants are
    > strictly limited).

    That's correct. The difference between the Arial-style glyph that looks
    a bit like a tilted treble clef (U+1D11E) and John's
    epsilon-with-solidus and Philippe's e-with-small-attached-t is one of
    style only. The distinction does not need to be encoded in plain text,
    any more than the distinction between a lowercase g with one bowl versus

    Apparently the math experts really, really needed to make a distinction
    in plain text between (e.g.) a less-than-or-equal sign with a horizontal
    bottom stroke and one with a slanted bottom stroke. We can take it on
    faith that that distinction is important in plain text, but we don't
    need to add more distinctions that probably aren't.

    > I don't see a solution for this "problem" within Unicode itself
    > (and neither in ISO/IEC 10646), unless a separate standard
    > is started to encode glyphs mapped to characters
    > (in the UCS-4 space, out of its 17 first planes?). For now the
    > safest way is to use specific fonts encoding these glyphs
    > in PUA positions, and bind these fonts to the abstract text
    > using stylesheets, meta information, or markup languages.
    > But with such technic, the abstract text would be modified.
    > A way to avoid it is to surround the text with markup that
    > specifies an explicicit substitution, like this in XML:
    > <typo as="&#xF001;">et</typo>,

    You probably don't want to start down the slippery slope of encoding
    Latin glyph variants as PUA characters. Check the archives of this
    mailing list; you will find that proposals to use the PUA to turn
    Unicode into a glyph registry are generally not well received.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Sun Jul 13 2003 - 20:10:04 EDT