Re: IJ joint in spaced lettering

From: Asmus Freytag (
Date: Mon Jan 09 2006 - 17:24:29 CST

  • Next message: Kenneth Whistler: "RE: IJ joint in spaced lettering"

    On 1/9/2006 11:30 AM, Jukka K. Korpela wrote:

    > On Mon, 9 Jan 2006, Kent Karlsson wrote:
    >>> Theoretically, U+0132 is a compatibility character with U+0049 U+004A
    >>> as the compatibility decomposition.
    >> It has the *standardised* (non-theoretical) decomposition: <compat> 0049
    >> 004A.
    > The word "Theoretically" meant that I first considered how things are
    > in principle, by the Unicode standard.

    "Is theoretically", and 'is defined as' are different, "in principle",
    but it seems the latter is what you meant.

    >>> Being a compatibility decomposable
    >>> character, it is not recommended except in the representation
    >> No, it does not say that.
    > "Compatibility decomposable characters are a subset of compatibility
    > characters included in the Unicode Standard to represent distinctions
    > in other base standards. They support transmission and processing of
    > legacy data. Their use is discouraged other than for legacy data or
    > other special circumstances."
    > Definition D21 in section 3,
    >> There are exceptions to that interpretation
    >> of compatibility characters (and compatibility decomposable characters),
    >> the IJ LIGATURE and the LONG S are among them. I think it is perfectly
    >> fine to recommend their use in situations like this
    > I think so too; we seem to agree on the practical point. But I
    > discussed what the standard says (in a somewhat odd place, but the
    > same general idea can be seen elsewhere in the standard, too).

    I agree that the language as written is too strong. The problem is that
    such statements are perfectly fine for a large set of these characters,
    but totally inappropriate for the bulk of them - unless that is,if your
    definition of 'special circumstances' is totally elastic.

    The history of this statement is interesting. It was first introduced in
    2.0, without any discouragement expressed. The latter was added in 3.0,
    but in 4.0, the reservation for 'special circumstances' was added.

    The number of compatibility characters in the standard has changed over
    the years

    3.0.0 2237 (of which <font> 37 , <super/sub> 63, <compat> 660)
    3.1.0 3230 (of which <font> 1028 , <super/sub> 63, <compat> 662)
    3.2.0 3282 (of which <font> 1037 , <super/sub> 64, <compat> 669)
    4.1.0 3363 (of which <font> 1038, <super/sub> 124, <compat> 673)
    4.1.0 3422 (of which <font> 1041, <super/sub> 169, <compat> 673)
    5.0.0 beta 3424 (of which <font> 1043, <super/sub> 169, <compat> 673)

    In other words, over time, about 1,000 new compatibility characters with
    <font> type decompositions have been added and about 100 with
    <super/sub>. These are characters that form an integral part of
    mathematical and phonetic notation, use that is certainly 'specialized'
    compared to general text use, but perhaps ill-described by the use of
    the tersm 'special circumstance' in the text.

    >> ZWJ could be used to "recommend" the use of a typographic ligature, but
    >> should not (IMO) be used to form *orthographic* ligatures
    > Such a distinction does not exist in the Unicode standard, and as you
    > mention, the IJ ligature would be a borderline case anyway.

    Typographically there is a clear difference between a ligature and a
    digraph. ZWJ - if implemented - in a Latin rendering engine would attempt
    to locate a ligated glyph. If the font lied and presented the digraph as
    a ligature, you might get what you want. However, since there is a
    *code point* for the IJ, fonts would most likely not offer a ligature glyph.

    Therefore, the use of ZWJ would have no effect, other than to introduce
    potential problems in all those rendering engines that do not
    support it for Latin.

    > Especially considering the classification of the ij ligature as a
    > letter in Dutch, we might say that it should really have been defined
    > as a primary (non-compatibility) character, much the same way as the
    > oe ligature and the ae ligature (which is now even called "letter ae",
    > not a ligature, though it's still effective used as a ligature, too).
    > But it's too late to change that now. (Maybe some official statement,
    > constituting an explicit exception to the principle of avoiding
    > compatibility decomposable characters, would be in order.)
    The problem with the IJ is that you end up with both usages, as i+j will
    give the intended result in many cases, and since an IJ key is lacking
    on most keyboards, i+j is what people will enter. As the exmaple shows,
    i+j will not give the intended result in some cases, so people will use
    ij or IJ to ensure that the case or spacing is what they want. About the
    only thing that can be done is document that thoroughly so that search
    engines and databases can do the right thing. (For example, I assume,
    but have not verified, that i+j and ij in fact sort the same in the DUCET).


    This archive was generated by hypermail 2.1.5 : Mon Jan 09 2006 - 17:25:36 CST