Re: Basic question: types of diacritics marks

From: Eric Muller (
Date: Fri Jul 19 2002 - 19:37:11 EDT wrote:

>I've little knowledge of PDF creators. I'm working with the understanding
>that (in the absense of a creator having added any glyph > char mapping),
>the characters would be inferred (at least by Acrobat) from the Postscript
>names of the glyphs in the PDF.
Correct (for name-keyed fonts; the rules are different for CID-keyed

But the glyph -> char mapping (aka "the ToUnicode cmap") is still not
the full answer. For one thing, it's mostly doing the computation that
the PDF consumer does, and sticking the result in the PDF. More
importantly, a glyph -> char mapping has severe limitations; in general,
it is not possible to invert the layout process (which did the original
char -> glyph mapping; can you reverse bidi?).

What I was alluding to is that the PDF format support the explicit
representation of the characters, and their correlation with the glyph
stream. Using this mechanism, you can obviously restore precisely the
character content. However, I not aware of any PDF generator that does that.

The glyh->char map works in many cases, so it is a useful optimization
(in particular the correlation between characters and glyphs is
expensive). What every producer fails to do is: "here is my glyph
string; let's see if the glyph->char map restores the characters I want;
no? ok, let's insert the character string". In some cases, there not in
a position to do it.

>Thus, if two characters correspond to one
>Postscript name, then they can't both be round-tripped when copying from
>Acrobat. I can easily believe that the creator is a factor in this.
Yes, it is best to avoid mapping two different characters (or character
strings, taking into account GSUB) to the same glyph. It can only help
and is not too costly. But sometimes, it's ok: for example, I don't
believe that we need two "f_i" glyphs, one to restore U+0066 U+0069, the
other to restore U+FB01. Always restoring U+0066 U+0069 is ok. The same
argument is even stronger for canonical equivalents.


This archive was generated by hypermail 2.1.2 : Fri Jul 19 2002 - 17:45:59 EDT