Ligatures fi and ffi

From: Jukka K. Korpela (jkorpela@cs.tut.fi)
Date: Wed Jun 01 2005 - 06:17:36 CDT

  • Next message: Dominikus Scherkl: "AW: Ligatures fi and ffi"

    (I took the liberty of changing the Subject, since this isn't really about
    "Glagolitic in Unicode 4.1" any more.)

    On Tue, 31 May 2005, Philippe Verdy wrote:

    > From: "Страхиња Радић" <vilinkamen@mail.ru>
    > > By using this kind of reasoning, we would end up asking why the heck
    > > was ``fi'' or ``ffi'' encoded when these two can be expressed with their
    > > corresponding atoms
    >
    > Today, they would not be encoded.

    I think they would be encoded even today, due to their presence in other
    character codes. But they would not be encoded, and would not have been
    encoded, without such background.

    > - - ligature processing is a required feature to support
    > even legacy ISO 8859 charsets like Arabic, or Indian standard charsets
    > (ISCII).

    Pardon? In which sense is ligature processing _required_? Do you mean that
    it is forbidden now to render "f" followed by "i" as two letters, without
    using a ligature? I don't see how an application would even be required to
    be _capable_ of using a ligature.

    > Unicode however cannot remove those characters.

    That's certainly true, due to the policy of never removing any characters.

    > They remain there for
    > compatibility, they are not recommanded,

    Is there any explicit statement in the Unicode standard that says that the
    ligatures should not be used?

    > they are considered compatibility
    > characters with canonical decompositions,

    No, characters like U+FB01 LATIN SMALL LIGATURE FI have _compatibility_
    decompositions. This means that replacing a ligature with the
    decomposition may remove formatting information - as it surely does.

    > and not part of normalized forms,

    They are part of normalization forms C and D, which involve canonical
    decomposition but not compatibility decomposition.

    > because their plain-text semantic is strictly equal to the semantic of their
    > decomposition in any human languages that use them.

    There is a difference in meaning as regards to rendering: U+FB01 very
    clearly says it's a ligature, whereas "fi" may or may not be rendered as a
    ligature.

    -- 
    Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
    


    This archive was generated by hypermail 2.1.5 : Wed Jun 01 2005 - 06:21:14 CDT