Re: Why people still want to encode precomposed letters

From: John Hudson ([email protected])
Date: Tue Nov 18 2008 - 14:43:13 CST

Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"

Previous message: John H. Jenkins: "Re: Why people still want to encode precomposed letters"
In reply to: Andreas Stötzner: "Re: Why people still want to encode precomposed letters"
Next in thread: Raymond Mercier: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Andreas Stötzner wrote:

> It is highly unrealistic to assume that at least some of the principal
> fonts will come with sufficient anchor point programming ever. Few
> well-sponsored specialists like John Hudson may be so lucky to labour on
> this for months...

It may be helpful to get some idea of the actual amount of work involved
in adding GPOS mark attachment positioning for arbitrary base+mark
sequences to an OT font. To do the job well, on a typical family of four
fonts (roman, italic, bold, bold italic) supporting three scripts
(Latin, Greek, Cyrillic) and all the combining mark characters up to
Unicode 5.0, should take between 2-4 weeks depending on the nature of
the design and how quickly one works.

The marks need to be categorised based on shared anchor positions, e.g.
above-centre, above-centre.cap*, below-centre, above-right, with
separate anchors for ogoneks and other marks that attach to the base. In
practical terms, the above-centre and below-centre anchors are the most
important and will probably account for almost all the real-world uses.
Even if one added only these to fonts, one would have gone a long way to
supporting arbitrary combinations that might occur in any text.

* If, as I do, one has variant forms of marks for above uppercase and
other tall letters, these need to be substituted contextually in the
<ccmp> GSUB feature. This means, of course, that one only needs to
provide anchors for such mark variants above uppercase and tall letters,
and not for the regular mark glyphs.

One needs to decide how to handle the precomposed diacritic characters
as bases for additional marks. My preference is to contextually
decompose these into simple bases plus marks when followed by a
combining mark, e.g. (in VOLT syntax):

Aacute -> A acutecomb
| <MARK-any>

[Note that I decompose when any mark follows, rather than just another
above mark, since I ensure that the anchored mark position is the same
as the composite accent position in the precomposed glyph: there should
be no visual difference between the /Aacute/ precomposed glyph and the
GPOS rendering of /A/acutecomb/.]

This approach greatly reduces and simplifies the number of bases on
which one needs to define anchors. However, if one is concerned about
some layout engines (incl. at least some of Adobe's) that have problems
processing one-to-many glyph substititions (decompositions), then one
will need to put anchors on all potential bases including precomposed
glyphs such as /Aacute/. This is significantly more work.

As I wrote earlier, I think the bottleneck on increasing support for
GPOS mark positioning is a workflow and tool issue. While it is nice to
have real-world test cases and hence some knowledge about what base+mark
sequences will actually occur in text, the benefit of the GPOS anchor
approach is that it does not rely on such knowledge: it can, and should,
be able to handle arbitrary combinations. What font developers need to
figure out are ways to leverage existing data to derive GPOS anchor
positioning and/or to automate parts of the workflow. One obvious way to
do this, since one generally wants GPOS mark positioning to accurately
mimic positioning within precomposed glyphs, is to leverage component
x,y offset positions as GPOS anchor locations.

This is made very much easier if combining mark glyphs, rather than
spacing accents, are used as components in precomposed diacritics, e.g.
the /Aacute/ glyph should be a composite of /A/ and /acutecomb/ (U+0301)
not /acute/ (U+00B4). This is contrary to the evolved practice of many
font developers and to some tool preconceptions, but these are easily
revised.

Obviously it is also necessary for mark components to be positioned
consistently, both on their own zero-widths and within composites with
the same base. In other words the x,y offset of components such as
/acutecomb/, /gravecomb/, /circumflexcomb/ and other above-centre marks
should be identical when applied to that same base letter such as /a/.

[This was not the case in recent font data I was working on for a
client, and it made the work of anchor definition much more difficult
than it should have been, and I had to abandon the goal of always having
the GPOS mark positioning mimic the positioning within the precomposed
glyphs.]

John Hudson

-- 
Tiro Typeworks        www.tiro.com
Gulf Islands, BC      [email protected]
You can't build a healthy democracy with people
who believe in little green men from Venus.
                    -- Arthur C. Clark

Next message: Andrew Cunningham: "Re: Why people still want to encode precomposed letters"
Previous message: John H. Jenkins: "Re: Why people still want to encode precomposed letters"
In reply to: Andreas Stötzner: "Re: Why people still want to encode precomposed letters"
Next in thread: Raymond Mercier: "Re: Why people still want to encode precomposed letters"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue Nov 18 2008 - 14:45:51 CST