From: Rick Cameron (Rick.Cameron@businessobjects.com)
Date: Thu Oct 26 2006 - 12:03:36 CST
Here's my understanding of how PDF files are structured, and why copying
& pasting from the file cited by Andreas does not work. Anyone who has
better knowledge of the mysteries of PDF, please feel free to correct.
A PDF file doesn't actually contain characters; it contains fonts and
glyph indices. In order for conversion to characters (Unicode or
otherwise) to be possible, the PDF file must specify the mapping from
glyph indices to Unicode code points (for each font, I think).
It appears that this PDF file doesn't have this mapping for the font
used for the Arabic characters. It looks to me like copy & paste causes
the raw glyph indices to be interpreted as code points, either in
Unicode or, perhaps, in my default native character set.
- rick cameron
From: email@example.com [mailto:firstname.lastname@example.org] On
Behalf Of Andreas Prilop
Sent: Thursday, 26 October 2006 7:26
Subject: Re: extracting code values from PDF?
On Thu, 26 Oct 2006, Jukka K. Korpela wrote:
> On Windows, for example, you can select a piece of text (or even a
> character) in Adobe Reader using the text select tool, copy it onto
> the clipboard, paste it in MS Word or WordPad, position the cursor
> after a character and press Alt-x. The character is then replaced by
> its Unicode code number.
Have you actually tried it?
For example, take some Arabic letters from
This archive was generated by hypermail 2.1.5 : Thu Oct 26 2006 - 12:05:07 CST