Joined "ti" coded as "Ɵ" in PDF

From: Andrew Cunningham <>
Date: Sun, 20 Mar 2016 19:57:38 +1100


It is all smoke and mirrors.

For English .... you have to choose the roght font. Simple, no advanced
features .... disable advanced typographic features in application if you

Ensure the cmap table in the font is sufficiently
comprehensive ....

The issues Don raise still exist in PDF/A. You would need to make
fundamental changes to the PDF spec for it to work for any language.

For other languages, esp those in complex scripts the situation is more
dire ... esp when glyphs have been reordered.

The accepted work around is ActualText. But you don't necessarily need
ActualText. Depends on font and language.

But the rub is that it is left to implementors to decide if and when the
ActualText is used. All aspects of the document ecosystem needs to be
looked at. Which tools can use ActualText instead of the visible text layer.

The PDF/UA spec is probably closer to the mark than the PDF/A spec.

But since most archives have no control over pdf production, authors' or
publishers' font selection, tools used, etc, then working with PDFs can be
fairly hit and miss. For languages written in complex scripts, its usially
a miss rather than a miss.

I rarely see ActualText in PDF files ,even in those that need it.


On Sunday, 20 March 2016, Janusz S. Bien <> wrote:
> Quote/Cytat - Andrew Cunningham <> (Sun 20 Mar 2016
12:06:29 AM CET):
>> Hi Don,
>> Latin is fine if you keep to simple well made fonts and avoid using more
>> sophisticated typographic features available in some fonts.
>> Dumb it down typographically and it works fine. PDF, despite all the
>> current rhetoric coming from PDF software developers, is a preprint
>> Not an archival format.
> What about PDF/A, ISO 19005-1:2005 Document Management – Electronic
document file format for long term preservation?
> Best regards
> Janusz
> --
> Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
> Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics

Andrew Cunningham
Received on Sun Mar 20 2016 - 03:58:52 CDT

This archive was generated by hypermail 2.2.0 : Sun Mar 20 2016 - 03:58:52 CDT