Re: Medievalist ligature character in the PUA

From: Asmus Freytag (
Date: Tue Dec 15 2009 - 11:00:57 CST

  • Next message: Julian Bradfield: "Re: Medievalist ligature character in the PUA"

    On 12/15/2009 5:28 AM, Doug Ewell wrote:
    > Jeroen Ruigrok van der Werven <asmodai at in dash nomine dot org> wrote:
    >> Actually ij is unbreakable from a language point of view. You cannot
    >> hyphenate any words using it like blijdschap into bli-jdschap.
    > I'm not sure this particular argument proves what you want it to
    > prove. In English you cannot insert a hyphen between the T and H in
    > "bother," or between the S and H in "fishing." But that says nothing
    > about whether the two characters are considered a single letter in
    > English, or whether they should or must be written as a ligature.
    > Your other arguments are more convincing.
    I think it serves as a sort of sufficient condition: if you can insert a
    hyphen, then the thing is not unbreakable. (But the implication doesn't
    work in the opposite case).

    Whether a single entity in a writing system gets encoded as a singleton
    or as a code sequence is initially a matter of choice. A sequences is,
    in principle, just as good a representation of an entity as a single
    code value (but, from a practical point of view, may require
    more/different support in an implementation). The real issue comes when
    you look at what the elements of the sequence encode by themselves.

    If Unicode had encoded "left half of ligature oe" and "right half of
    ligature oe" then these two code points in sequence would be
    distinguishable from the sequence of "o" and "e", even though the
    ligature-derived entity is not coded with a single code value (in this
    hypothetical example).

    If a letter pair has special behavior in a particular language, then you
    have the choice of putting the burden of identifying that pair on to the
    user or the implementation. If a pair is absolutely consistently treated
    as a pair, then asking the user to identify it as such is unnecessary
    (think of the lam-alif ligature in Arabic).

    Otherwise, if the implementation can't correctly identify when a pair is
    special, you have no choice but to give the user a means to identify it.
    If the distinction is orthographic, it belongs in the encoding,
    otherwise it could live in meta-data.

    The problem comes in situations that aren't pure. Use or not use of
    ligation for a text is a stylistic choice, but use of ligation for
    specific words can be prohibited in ways that are orthographic (and not


    This archive was generated by hypermail 2.1.5 : Tue Dec 15 2009 - 11:03:02 CST