Re: minimizing size (was Re: allocation of Georgian letters)

From: James Kass (thunder-bird@earthlink.net)
Date: Fri Feb 08 2008 - 14:46:11 CST

  • Next message: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"

    John H. Jenkins replied,

    >> 2/
    >> My question was, mostly all proper publishing softwares do not yet support complex rendering. How many years since Unicode come
    >> into being?
    >> When is this going to be resolved, or do we plan on choosing an alternative encoding as Unicode is not working.
    >>
    >
    > Well, what applications are you thinking of and on what platforms? As I say, Word on Windows is fine for almost everything in
    > Unicode, and Pages on Mac OS X is fine for all of it. It is resolved now in that sense.

    http://www.trigeminal.com/samples/provincial.html

    From Michael Kaplan's page, "Anyone Can Be Provincial", I scraped four
    script examples into a plain-text editor:

    அவர்கள் ஏன் தமிழில் பேசக்கூடாது ?
    რატომ არ ლაპარაკობენ ისინი ქართულად?
    Ինչու՞ նրանք չեն խոսում Հայերեն
    なぜ、みんな日本語を話してくれないのか?

    I converted the plain-text file into PDF format using the CutePDF application.

    Copy/pasting from the PDF back into a plain-text editor gives the following:

    (PDF made from BabelPad)
      அ வ  க ள ் ஏ ன ் த   ல ் ே ப ச க ் ட ா  ?
      რ ა ტ ო მ ა რ ლ ა პ ა რ ა კ ო ბ ე ნ ი ს ი ნ ი ქ ა რ თ უ ლ ა დ ?
      Ի ն չ ո ւ ՞ ն ր ա ն ք չ ե ն խ ո ս ո ւ մ Հ ա յ ե ր ե ն
      な ぜ 、み ん な 日 本 語 を話し て く れ な い の か ?

    (PDF made from Notepad)
    அவகள ் ஏன ் தல் ேபசக்டா ?
    რატომ არ ლაპარაკობენ ისინი ქართულად?
    Ի ն չ ո ւ ՞ ն ր ա ն ք չ ե ն խ ո ս ո ւ մ Հ ա յ ե ր ե ն
    な ぜ 、み ん な 日 本 語 を話し て く れ な い の か ?

    Now, I don't know where those extra spaces are coming from, but I bet
    they make searching difficult.

    Tamil is the complex script, the other three scripts are not complex.

    The Tamil text as it comes back from the PDF is now stored in visual
    order rather than the correct order. Further, presentation forms
    generated via OpenType happen to be mapped in the private
    use area in the font I used, but they were not part of the original
    data. Now they are showing up in the text, so it is clear that the
    original textual information is getting lost.

    Using Notepad, with Latha font displaying the Tamil, and the same
    CutePDF application, the PDF displays just fine, but copy/pasting
    from it back into any plain-text editor gets a bunch of question
    marks:

    ??????????????????????????

    Using BabelPad with Latha and CutePDF results in:

    ? ? ? ?? ? ?? ? ?? ? ? ? ? ? ?? ? ? ? ? ? ?? ??

    It's this lack of support for complex scripts (and, by extension,
    Unicode) in popular publishing applications which is so distressing
    to users.

    Best regards,

    James Kass



    This archive was generated by hypermail 2.1.5 : Fri Feb 08 2008 - 14:50:11 CST