From: James Kass (thunder-bird@earthlink.net)
Date: Fri Feb 08 2008 - 14:46:11 CST
John H. Jenkins replied,
>> 2/
>> My question was, mostly all proper publishing softwares do not yet support complex rendering. How many years since Unicode come
>> into being?
>> When is this going to be resolved, or do we plan on choosing an alternative encoding as Unicode is not working.
>>
>
> Well, what applications are you thinking of and on what platforms? As I say, Word on Windows is fine for almost everything in
> Unicode, and Pages on Mac OS X is fine for all of it. It is resolved now in that sense.
http://www.trigeminal.com/samples/provincial.html
From Michael Kaplan's page, "Anyone Can Be Provincial", I scraped four
script examples into a plain-text editor:
அவர்கள் ஏன் தமிழில் பேசக்கூடாது ?
რატომ არ ლაპარაკობენ ისინი ქართულად?
Ինչու՞ նրանք չեն խոսում Հայերեն
なぜ、みんな日本語を話してくれないのか?
I converted the plain-text file into PDF format using the CutePDF application.
Copy/pasting from the PDF back into a plain-text editor gives the following:
(PDF made from BabelPad)
அ வ க ள ் ஏ ன ் த ல ் ே ப ச க ் ட ா ?
რ ა ტ ო მ ა რ ლ ა პ ა რ ა კ ო ბ ე ნ ი ს ი ნ ი ქ ა რ თ უ ლ ა დ ?
Ի ն չ ո ւ ՞ ն ր ա ն ք չ ե ն խ ո ս ո ւ մ Հ ա յ ե ր ե ն
な ぜ 、み ん な 日 本 語 を話し て く れ な い の か ?
(PDF made from Notepad)
அவகள ் ஏன ் தல் ேபசக்டா ?
რატომ არ ლაპარაკობენ ისინი ქართულად?
Ի ն չ ո ւ ՞ ն ր ա ն ք չ ե ն խ ո ս ո ւ մ Հ ա յ ե ր ե ն
な ぜ 、み ん な 日 本 語 を話し て く れ な い の か ?
Now, I don't know where those extra spaces are coming from, but I bet
they make searching difficult.
Tamil is the complex script, the other three scripts are not complex.
The Tamil text as it comes back from the PDF is now stored in visual
order rather than the correct order. Further, presentation forms
generated via OpenType happen to be mapped in the private
use area in the font I used, but they were not part of the original
data. Now they are showing up in the text, so it is clear that the
original textual information is getting lost.
Using Notepad, with Latha font displaying the Tamil, and the same
CutePDF application, the PDF displays just fine, but copy/pasting
from it back into any plain-text editor gets a bunch of question
marks:
??????????????????????????
Using BabelPad with Latha and CutePDF results in:
? ? ? ?? ? ?? ? ?? ? ? ? ? ? ?? ? ? ? ? ? ?? ??
It's this lack of support for complex scripts (and, by extension,
Unicode) in popular publishing applications which is so distressing
to users.
Best regards,
James Kass
This archive was generated by hypermail 2.1.5 : Fri Feb 08 2008 - 14:50:11 CST