Re: Why people still want to encode precomposed letters

From: John Hudson (
Date: Tue Nov 18 2008 - 14:43:13 CST

  • Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"

    Andreas Stötzner wrote:

    > It is highly unrealistic to assume that at least some of the principal
    > fonts will come with sufficient anchor point programming ever. Few
    > well-sponsored specialists like John Hudson may be so lucky to labour on
    > this for months...

    It may be helpful to get some idea of the actual amount of work involved
    in adding GPOS mark attachment positioning for arbitrary base+mark
    sequences to an OT font. To do the job well, on a typical family of four
    fonts (roman, italic, bold, bold italic) supporting three scripts
    (Latin, Greek, Cyrillic) and all the combining mark characters up to
    Unicode 5.0, should take between 2-4 weeks depending on the nature of
    the design and how quickly one works.

    The marks need to be categorised based on shared anchor positions, e.g.
    above-centre, above-centre.cap*, below-centre, above-right, with
    separate anchors for ogoneks and other marks that attach to the base. In
    practical terms, the above-centre and below-centre anchors are the most
    important and will probably account for almost all the real-world uses.
    Even if one added only these to fonts, one would have gone a long way to
    supporting arbitrary combinations that might occur in any text.

    * If, as I do, one has variant forms of marks for above uppercase and
    other tall letters, these need to be substituted contextually in the
    <ccmp> GSUB feature. This means, of course, that one only needs to
    provide anchors for such mark variants above uppercase and tall letters,
    and not for the regular mark glyphs.

    One needs to decide how to handle the precomposed diacritic characters
    as bases for additional marks. My preference is to contextually
    decompose these into simple bases plus marks when followed by a
    combining mark, e.g. (in VOLT syntax):

            Aacute -> A acutecomb
            | <MARK-any>

    [Note that I decompose when any mark follows, rather than just another
    above mark, since I ensure that the anchored mark position is the same
    as the composite accent position in the precomposed glyph: there should
    be no visual difference between the /Aacute/ precomposed glyph and the
    GPOS rendering of /A/acutecomb/.]

    This approach greatly reduces and simplifies the number of bases on
    which one needs to define anchors. However, if one is concerned about
    some layout engines (incl. at least some of Adobe's) that have problems
    processing one-to-many glyph substititions (decompositions), then one
    will need to put anchors on all potential bases including precomposed
    glyphs such as /Aacute/. This is significantly more work.

    As I wrote earlier, I think the bottleneck on increasing support for
    GPOS mark positioning is a workflow and tool issue. While it is nice to
    have real-world test cases and hence some knowledge about what base+mark
    sequences will actually occur in text, the benefit of the GPOS anchor
    approach is that it does not rely on such knowledge: it can, and should,
    be able to handle arbitrary combinations. What font developers need to
    figure out are ways to leverage existing data to derive GPOS anchor
    positioning and/or to automate parts of the workflow. One obvious way to
    do this, since one generally wants GPOS mark positioning to accurately
    mimic positioning within precomposed glyphs, is to leverage component
    x,y offset positions as GPOS anchor locations.

    This is made very much easier if combining mark glyphs, rather than
    spacing accents, are used as components in precomposed diacritics, e.g.
    the /Aacute/ glyph should be a composite of /A/ and /acutecomb/ (U+0301)
    not /acute/ (U+00B4). This is contrary to the evolved practice of many
    font developers and to some tool preconceptions, but these are easily

    Obviously it is also necessary for mark components to be positioned
    consistently, both on their own zero-widths and within composites with
    the same base. In other words the x,y offset of components such as
    /acutecomb/, /gravecomb/, /circumflexcomb/ and other above-centre marks
    should be identical when applied to that same base letter such as /a/.

    [This was not the case in recent font data I was working on for a
    client, and it made the work of anchor definition much more difficult
    than it should have been, and I had to abandon the goal of always having
    the GPOS mark positioning mimic the positioning within the precomposed

    John Hudson

    Tiro Typeworks
    Gulf Islands, BC
    You can't build a healthy democracy with people
    who believe in little green men from Venus.
                        -- Arthur C. Clark

    This archive was generated by hypermail 2.1.5 : Tue Nov 18 2008 - 14:45:51 CST