Re: Dutch IJ & the austrian stamp's encoding

From: Asmus Freytag (
Date: Wed Jan 18 2006 - 00:15:00 CST

    On 1/17/2006 3:55 PM, André Szabolcs Szelp wrote:

    >The discussion about IJ in the beginning asked, if
    >I + ZWJ + J
    >would be the proper use for getting rid of the letter-spacing
    >in typesetting.
    >ZWJ is for indicating graphical ligatures. On the other
    >hand, I'd like to point your attention to the already
    >which seems to be meant exactly for such a use:
    >it indicates, that two glyphs for one grapheme (an not
    >ligature, and that's exactly what the Dutch IJ is:
    >a single grapheme, and in fact not a ligature),
    >this information can be used by the application
    >for e.g. correct collating or spacing.
    CGJ is now discouraged from having any visible effect.

    >Note, for example, that the same letter-spacing behaviour
    >can be observed for German "ch" or "tz" when typeset in
    >Fraktur. (though not in roman fonts).
    The ch definitely is a ligature in Fraktur, in fact a mandatory one. The
    tz probably is as well, but I would have to look it up to be certain.
    The rule in Fraktur is simply to not break ligatures (when adding letter
    spacing). Fraktur mimics the manuscript tradition to some extent, and is
    consciously trying to conserve horizontal extent (you can write more
    words per line in Fraktur than in Roman). Hand in hand with that goes
    the extensive use of (mandatory) ligatures.

    >It's an other question, that U+034F is not supported by most
    >applications, but it's still the correct representation for
    >IJ in Dutch if you want to avoid compatibility characters.
     From the text of 4.1: (*)

    "U+034F combining grapheme joiner is used to affect the collation of
    adjacent characters for purposes of language-sensitive collation and
    searching, and to distinguish sequences that would otherwise be
    canonically equivalent."

    and from the existing text: (*)

    "For rendering, the combining grapheme joiner is invisible. However,
    some older implementations
    may treat a sequence of grapheme clusters linked by combining grapheme
    as a single unit for the application of enclosing combining marks."

    (*) I copied both not from their sources, but from the current working
    draft of Unicode 5.0 that I happened to have opened to the CGJ - it's
    possible that some of this reflects 5.0 specific edits, but in any case,
    this is how the CGJ will likely be described in the forthcoming version.

    This clearly does not support using the CGJ to create an IJ digraph.

    >Concerning the discussion about the Austrian stamp,
    >the text encoded is clearly O-umlaut, U-umlaut and
    >U, and the O-e, U-e, V forms are merely font issues.
    >How can you verify this? If the text was marked and
    >the font changed to, e.g. Roman, you would /not/ expect
    >to see O-e, U-e and V. On the other hand, in this
    >historicising font style the form of U is V and the
    >form of Ö is Oe. Period.
    I'd tend to agree. For a decorative font like the one used, such shape
    substitutions make sense - just as they would be a terrible choice for a
    general purpose font that tried to support the entire range of writing
    with the Latin script.



