Re: 'No Character' symbol

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon May 01 2006 - 04:32:21 CST

  • Next message: Philippe Verdy: "Re: Unicode fonts"

    John H. Jenkins wrote on Monday, May 01, 2006 at 3:54 AM

    >> In most applications, if a character is requested that isn't
    >> available in the current font, most applications will instead
    >> display an empty box. Essentially what I'm wondering is whether
    >> this box symbol is application specific, or font specific, and in
    >> the latter case, what the character symbol is.
    >
    > It's actually font-specific (at least in TrueType/OpenType fonts).
    > It typically corresponds to no character in the font; it's a
    > character-less glyph.

    I'll expand on this, as fonts can contain many character-less glyphs, of
    various degrees!

    The basic point is that the glyphs in a font in the OpenType format are
    identified by number, and glyph 0 is the one to be used when there is no
    proper glyph for a character. (Postscript fonts work by name, and the
    corresponding glyph is identified by the name '.notdef'.) This glyph is
    sometimes known as the 'missing glyph'.

    Fonts defined in the OpenType format have a 'table' called the 'cmap' table,
    which converts from character code to glyph. Actually, there may be several
    sets of look-up tables, for different coding systems (even UCS-2 v.
    full-range Unicode) and different platforms - for example, the Apple logo
    should not be accessible on a Windows platform! This is not the end of the
    story, for the conversion may be further refined (in accordance with data
    tables stored in the font), e.g. to support all the complexities of Indic
    and Arabic shaping. To give a Latin script example, the width of a macron
    may depend on what it is placed above, with different glyphs for different
    widths. In this case there will be multiple glyphs all corresponding, in a
    sense, to U+0304, though the cmap will map U+0304 to a specific glyph.
    (This dependence is a necessity for a combining overline, U+0305, if it is
    to be handled by normal mechanisms.)

    For simplicity in the construction of the cmap, some characters may actually
    be mapped to the missing glyph. In these days of font substitution, I
    suspect this is a bad idea.

    Finally, there may be other character-less glyphs that simply cannot be
    accessed via character codes at all.

    Richard.



    This archive was generated by hypermail 2.1.5 : Mon May 01 2006 - 04:39:30 CST