Re: Ligatures fi and ffi

From: Hans Aberg (
Date: Thu Jun 02 2005 - 12:32:45 CDT

  • Next message: Erik van der Poel: "Re: JIS X 0208 mappings in Unihan.txt"

    At 08:04 -0700 2005/06/02, Doug Ewell wrote:
    >Glyphs in a font do not have to be associated 1-to-1 with Unicode code
    >points. Indeed, they must not, if they are able to handle certain
    >context-dependent scripts.
    >There is no need to encode additional precomposed Latin ligatures, and
    >they will not be encoded.

    It seems me that, when considering a new glyph, one should strive to
    figure if there is any semantic value to it; if not, it should
    probably not be added. Figuring out the semantic value can sometime
    be easy, sometimes difficult. For example, in some scripts, the
    ligature "ae" is a separate letter, and it should obviously be added
    based on that. In English, this ligature is exchangeable with the
    letter combination "ae". So based on only that, it might first be
    thought it should not be added. But, by the use of this ligature,
    there is a communication of the etymology of the word in the
    spelling, and this is a kind of semantic value. One then must judge
    how important this value is, if it is enough for an addition.

    So, switching to the glyph "fi", most readers would not even notice
    it is there; so its semantic value is zero. It is only a rendering
    technique. But, for example, in math, any glyph could in principle be
    used. A mathematician could pick up the glyph "fi", or another
    ligature, and assign it a special value. This is not likely though,
    and even if some did that with some glyph, the semantic value of
    doing so might not be considered be enough for an addition to the
    Unicode character set. The mathematician in question could do what
    mathematicians often have done in the past, pick another glyph, and
    the writings would not suffer in semantic presentation. It could, of
    course, happen that some glyphs become common and more acceptable,
    and should be added based on that principle. One example are the
    MATHEMATICAL DOUBLE-STRUCK letters. These originally only existed for
    a few capital letters, used to free other letters in the case of
    common, standard sets. There are often hated by typographers, it
    seems, who find them ugly, but loved by mathematicians, because of
    their usefulness. Gradually, mathematicians have wanted all English
    letters added in this series. Because of this gradual realization,
    these letters have some funny Unicode code points, not in adjacent

    So there are a number of principles and judgements involved, and they
    may evolve slowly over time.

    >You can bet that the keepers of the Unicode Standard will not
    >"re-invent" it by renouncing the core technical principles that have
    >guided them for 14 years. This kind of "thinking outside the box" is
    >highly prized in marketing and industry invention, but it is a death
    >blow for an interoperable standard.

    The current Unicode character set is a mixed bag, rather empirically
    made, than based on some general principles which are specialized in
    the particular case at hand.

    Clearly computer technology will evolve. This is very apparent in the
    case of ligatures: The ligatures needed in the past are no longer
    needed in more advanced rendering systems.

    There are different ways to cope with such changes. One way is to
    declare that the Unicode character set is as it is. Then one would
    design new character set, of course, in some way upwards compatible
    with current Unicode character set, but removing and streamlining the
    parts that are not needed in a more advanced computer technology.

    But it is also fully possible to admit such changes within the
    current character set, by simply adding the new features, and mark
    down the usage by the means of property fields. In some sense, this
    is simpler, as one will want to have access to all Unicode characters
    anyway, for backwards compatibility. In the case of the ligature
    "fi", one might add a redirection to the letter combination "fi". In
    the case of the ligature "ae", one could not do so, as it cannot
    always be replaced. If there is some script where this change can
    always be done, one can add information about that, say via special
    script abstract characters, so that the redirection can take place.
    This way, the ligatures already added to the Unicode set which are
    only used for rendering purposes, can successively be put out of use,
    resulting in a cleaner, more semantically oriented core.

    Perhaps this picture I have described above too far away in the
    future for some to focus at it. But it seems me that others are
    already thinking along these lines. So it will happen; the question
    is only how.

       Hans Aberg

    This archive was generated by hypermail 2.1.5 : Thu Jun 02 2005 - 12:35:39 CDT