From: Eric Muller (firstname.lastname@example.org)
Date: Sat Feb 09 2008 - 19:26:03 CST
James Kass wrote:
> Now, I don't know where those extra spaces are coming from, but I bet
> they make searching difficult.
Acrobat (Pro and Reader) is attempting to reconstruct correctly the text
even in adversarial conditions. The spaces are the result of attempts at
obtaining the best results across a wide range of PDF documents.
Slightly longer answer:
In many cases, PDF generation is hooked at fairly late stage of the
pipeline that goes from the user input to a printed image. For an input
like "the car" you can end up with PDF content of the form (using a
(the car) showstring
(the) showstring 50 advance (car) showstring
To accommodate the later case, Acrobat needs to generate a space
character when there is no space glyph. Because there are many
complications of the same nature, the conditions under which to generate
a space character are non trivial, and most likely involve some
compromises. Furthermore, it is quite likely that the class of PDFs
corresponding to Indic texts was not considered when determining those
May be the conditions which are actually coded in Acrobat can be refined
to work better for Indic texts, may be there are inherent conflicts with
other PDFs (I just don't know).
This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 19:29:04 CST