Re: minimizing size (was Re: allocation of Georgian letters)

From: James Kass (thunder-bird@earthlink.net)
Date: Sat Feb 09 2008 - 03:54:34 CST

  • Next message: Joó Ádám: "Re: Old Hungarian script - progress?"

    Doug Ewell wrote,

    > As much as I like BabelPad (it has replaced SC UniPad as my favorite
    > full-service-Unicode editor), I have had serious problems pasting text
    > into BabelPad from the clipboard. Sometimes there is a large chunk of
    > random text after the "real" data; there have been other symptoms as
    > well. I assume Andrew will be able to resolve these when he has a
    > chance to update the program.
    >
    > Except in the presence of bugs such as this, Unicode data can be copied
    > and pasted from one Unicode-aware program to another Unicode-aware
    > program with 100% fidelity, regardless of the encoding model.

    (Andrew responds well to reported problems, but how can he fix bugs
    in third-party PDF applications?)

    The operative phrase is "Unicode-aware application". I believe it would
    possible to copy/paste text back-and-forth between BabelPad and
    Notepad until the mouse wore out without data corruption.

    PDF has long been touted as *the* way to safely send text with the
    assurance that the recipients will be able to display that text exactly
    as the author intended. While it's true that the recipient sees what
    was intended, it does not seem to be true that actual text is being
    sent. Once the material is in PDF format, no further text processing
    appears to be possible; the actual text has been lost somewhere along
    the way. (ASCII text notwithstanding.)

    Without any real knowledge of the PDF format and what happens when
    converting a file to PDF, it appears to me that it is not text which is
    being embedded. Rather, the process is embedding glyphs. If a glyph
    is mapped to a Unicode value, at least some applications can return that
    value. But, if the glyph is not mapped to a unicode value (which is
    normally the case with presentation forms used in complex scripts),
    there does not seem to be any effort made to preserve the Unicode
    string which generated the presentation form. And that's really a
    shame.

    Another shame is telling Tamil users that Unicode won't standardize
    a duplicate encoding until a certain event happens. This gives the
    misleading impression that there's at least a possibility that Unicode
    might encode TACE/TUNE.

    It would have been much better, my opinion, to simply have told people
    up front that there is absolutely no possibility whatsoever for such a
    duplicate encoding in the standard. In which case, the people who have
    spent time and effort towards such an encoding could have been doing
    something productive with their time and resources instead of wasting
    them. Like, for example, solving problems with the PDF format
    related to complex scripts.

    Best regards,

    James Kass

    P.S. - There's a special FAQ page for Tamil encoding issues here:
    http://unicode.org/faq/tamil.html

    Suggested additions to that page might include:

    Q: Is there any possibility that a new character encoding scheme for
    Tamil which considers ligatures as characters will either be added to
    Unicode side-by-side with the existing Unicode Tamil encoding or
    replace the current Tamil Unicode encoding model altogether?

    A: No.



    This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 03:58:24 CST