RE: Accessing alternate glyphs from plain text

From: Peter Constable (
Date: Sat Aug 14 2010 - 14:45:11 CDT

  • Next message: "fraktur letterspacing, ligatures"

    You are assuming that the application will automatically select certain alternate glyphs at the ends of lines. There are limits to what can be done here: automatically selecting a feature in a given context is easy for software to do; knowing that the result is aesthetically pleasing is not. The OpenType 'falt' feature, "Final Glyph on Line Alternates" (see, can be used for your scenario. But it probably should be left to a user to select that manually rather than having software apply it automatically. Thus, you should have an additional step after #3 to apply that feature.

    Btw, not all fonts will support that feature; it is up to the font designer to choose which features they want to support.

    Re your steps 8-10, the ability for a PDF reader to recover the original character sequence depends on a few factors. A PDF file contains a description of the text in terms of glyph IDs; the original character information is optional, and not all PDF writers will insert it. Depending on the font, it may be possible to recover character information from glyph name data, but this can be problematic in some cases (e.g., especially for Indic scripts since re-ordering information cannot be captured in glyph names). But the issues here are really orthogonal to the original topic, which was use of variation sequences or font features to select end-of-line alternates: if variation sequences were used, the same issues in retrieving the original character data from PDF exist.

    Re the GSUB implementation in a font, you gave (for one case)

            e unicode_ee0f -> e_alt_end

    (I'll assume that each of the references here is to a glyph, not a character.) I'm not sure what you intend by the second input, "unicode_eeof". If this is meant to correspond to a variation selector, then this is not how to implement this in an OpenType font. Instead, variation sequences are supported by using a format-14 cmap subtable. (See for details.)

    If, on the other hand, you wanted to use an OpenType feature, such as the 'falt' feature, then you would create a GSUB type 1 lookup table, which does a one-to-one substitution:

            e -> e_alt_end

    Then you would add a feature table for 'falt' that references the aforementioned lookup. If you wanted to support several alternates, you could use the Stylistic Set features ('ss01' - 'ss20') with type 1 (single substitution) lookups, or you could use the All Alternates feature ('aalt') with type 3 (alternate substitution) lookups. (The latter would be designed for use with UI that presents a palette of glyph alternatives.)


    -----Original Message-----
    From: [] On Behalf Of William_J_G Overington
    Sent: Saturday, August 14, 2010 1:48 AM
    To: Unicode Mailing List; John H. Jenkins
    Subject: Re: Accessing alternate glyphs from plain text

    Thank you for taking the time to produce the pdf and thank you also for sharing the result.
    On Thursday 12 August 2010, John H. Jenkins <> wrote:
    > You seem to be missing a couple of
    > important points here which Peter is illustrating.
    > First of all, what you want to do can be done with existing
    > technology.  There's no need to add variation selectors or other
    > mechanisms to achieve your goal.
    Well, the test would seem to be as follows regarding OpenType and a pdf.
    1. Copy the plain text from the Unicode post onto the clipboard.
    2. Paste from the clipboard into an OpenType-aware application from which a pdf can be produced.
    3. Select All on the pasted text and choose an appropriate font.
    4. Unselect the text.
    5. Produce a pdf.
    6. Display the pdf using Adobe Reader.
    7. Assess the evidence question number 1. Does the display show text using the basic glyphs for the font for all of the text except for the last character of each verse of the poem; and for the last character of each verse of the poem display an alternate ending glyph?
    8. Copy all of the text from the pdf onto the clipboard.
    9. Paste from the clipboard into a basic wordprocessing program (for example, on a PC, Microsoft WordPad) and format the text using a general TrueType font, such as, for example, Arial.
    10. Assess the evidence question number 2. Is the text of the original poem displayed in WordPad?
    Both questions need answers of yes for the combination of OpenType-aware application program and font to have done the task as required.
    It seems to me that, at the present time, a necessary condition to pass that test is that the font used is an OpenType font that has alternate ending glyphs for e, h and t and has entries in its GSUB table, the table used for storing glyph substitution rules for the font, along the following lines. (Please know that my knowledge of OpenType GSUB tables is not great, so my way of expressing the rules here may well be different from the regular way, but hopefully the ideas that I am trying to express will be clear: experts are welcome to correct my way of expressing it please.)
    e unicode_ee0f -> e_alt_end
    h unicode_ee0f -> h_alt_end
    t unicode_ee0f -> t_alt_end
    Now, I am unaware of any such OpenType font existing at the present time. It is possible that one does, because the encoding information needed to produce it has been available since Thursday when I published the poem in this mailing list.
    > Secondly, fonts are themselves works of art, and a well-designed face
    > will have a set of swashes appropriate face but not necessarily
    > another face.  Simply saying "I want a swash here" isn't enough.  On a
    > Mac, for example, Hoefler Text Italic has one swash available for the
    > "t", whereas Zapfino has three, none of which are like the swash
    > Hoefler Text Italic provides, and one of which is inappropriate for
    > use at the end of a line.  Most fonts won't have any, because swashes
    > are usually seen as the purview of calligraphic fonts.
    > So what do you do?  Do you provide a variation selector for every kind
    > of swash a font designer might include to make sure you get the
    > "right" one?  Or do you just say, "Put a swash in here, I don't care
    > what it looks like?"  Neither seems like a good idea.
    Well, neither of the choices offered is how I would proceed.
    If the idea becomes incorporated into regular Unicode there would be example glyphs. The encoding would be decided by discussion.
    Yet, to answer your question, I would probably encode several alternatives for an ending glyph. For example, again using Private Use Area codes here for correctness at the present time, U+EE0F for an an ordinary ending glyph that is about twice the width of the basic glyph and U+EE0E for an ending glyph that is about five times the width of the basic glyph (except for an m, where it would be so that it matched the other characters): in each case recommending implementation in the font with the glyph having the same advance width as the basic glyph, so that a following apostrophe could be used following the glyph, for use in Esperanto poetry.
    In order to access an alternate starting glyph, for these experiments one could use U+EE0C. This would mean that the swash capitals of an italic font could be accessed from plain text. For these experiments, if U+EE0C were used with a regular (that is, not italic) font then there would be a wrong display, unless the regular font had rules such as the following within it.
    A unicode_ee0c -> A
    However, if U+FE0C were encoded within regular Unicode for the purpose, then that would not, if I understand variations selector rules correctly, be a problem, as the U+FE0C would be treated much as if a zero width space by the application program if the font did not have a rule using it for the particular character.
    > Typography is not done with plain text.
    > Just to illustrate *my* point, I'm adding a PDF of four of the huge
    > number of possibilities for laying out your first stanza with Zapfino
    > on a Mac.  Which one did the poet intend?
    Well, what I intended was that all of the characters except for the last character of each verse, a total of three characters, would be typeset using the basic glyphs of the font and that the last character of each verse would be typeset using an alternate ending glyph. Based solely on the glyph complement pdfs that Adobe make available on the web, the Arno Pro and Arno Pro Italic fonts would both seems suitable.

    William Overington
    14 August 2010

    This archive was generated by hypermail 2.1.5 : Sat Aug 14 2010 - 14:50:53 CDT