Re: "Missing character" glyph- example

From: James Kass (
Date: Thu Aug 01 2002 - 22:08:51 EDT

Peter Constable wrote,

> ... For instance, in Times New Roman, Arial, Tahoma and even
> James' own Code2000, the first entry in the cmap is for U+0020:

Please note that the first entry in the cmap covers Glyph ID 3.
Glyph IDs 0, 1, and 2 don't need to be covered by cmap, as they
are constants which are supposed to be handled by default.

Glyph ID Zero is the first glyph in every font. (TTF/OTF)

Zero = Null ---> this is the glyph used for any code point
not covered by the font, that is to say not included in
the cmap (character map).

Unfortunately, entering "�" in a web page will only display
the string ampersand, number sign, zero, zero, zero, zero, semi-colon.

John Hudson wrote,

> If, by 'missing glyph', you mean the .notdef glyph it should indeed be the
> first glyph in the repertoire (but alas, may not be due to bad font tools),

Bad font tools may allow a designer to place a LATIN CAPITAL LETTER A
glyph first in the font. By definition, in that bad font, LATIN CAPITAL
LETTER A would be used for 'missing glyph'.

A good font tool should allow a designer to draw their interpretation
of the 'missing glyph', though. Some designers use their own logo
as 'missing glyph', and a designer with a wicked sense of humour
and a poor sense of perspective might even make the 'missing
glyph' look just like LATIN CAPITAL LETTER A.

<John Hudson continues>
> but it should *not* be encoded as U+0000 or as any other codepoint. .notdef
> should be unencoded.
> The first four glyphs in a font should be:
> .notdef (unencoded, symbolic glyph signifying missing glyph)
> .null (sometimes call NUL or NULL, U+0000, usually zero-width sans
> outline)
> CR (U+000D, usually zero-width sans serif)
> space (U+0020, often double-mapped to U+00A0)

(Smile) What is the difference between a zero-width sans serif
glyph and a zero-width serif glyph?

Seriously, aside from the typo, John Hudson is essentially correct.

The conventions John mentions were originally part of the
MacIntosh character set. Post script names "notdef", ".null",
and "CR" in the older TTF specs have no Unicode value assigned
at all. Assigning 0x0 to .null and 0xd to CR were originally
MacIntosh conventions. Indeed, these hex numbers are called
"US Macintosh character code for glyph" in the old TTF specs.

Even though notdef, .null, and CR were not part of either the
UGL character set or the US Win31 character set; they are
included in the WGL4 character set.

"notdef", ".null", and "CR" are all unencoded.

I've always considered "notdef" and ".null" to be semantically
equal. Technically, though, this is incorrect.

Best regards,

James Kass.

This archive was generated by hypermail 2.1.2 : Thu Aug 01 2002 - 19:53:56 EDT