Re: minimizing size (was Re: allocation of Georgian letters)

From: James Kass (thunder-bird@earthlink.net)
Date: Sat Feb 09 2008 - 03:54:34 CST

Next message: Joó Ádám: "Re: Old Hungarian script - progress?"

Previous message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: Bala: "RE: minimizing size (was Re: allocation of Georgian letters)"
Reply: Bala: "RE: minimizing size (was Re: allocation of Georgian letters)"
Reply: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: Eric Muller: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Doug Ewell wrote,

> As much as I like BabelPad (it has replaced SC UniPad as my favorite
> full-service-Unicode editor), I have had serious problems pasting text
> into BabelPad from the clipboard. Sometimes there is a large chunk of
> random text after the "real" data; there have been other symptoms as
> well. I assume Andrew will be able to resolve these when he has a
> chance to update the program.
>
> Except in the presence of bugs such as this, Unicode data can be copied
> and pasted from one Unicode-aware program to another Unicode-aware
> program with 100% fidelity, regardless of the encoding model.

(Andrew responds well to reported problems, but how can he fix bugs
in third-party PDF applications?)

The operative phrase is "Unicode-aware application". I believe it would
possible to copy/paste text back-and-forth between BabelPad and
Notepad until the mouse wore out without data corruption.

PDF has long been touted as *the* way to safely send text with the
assurance that the recipients will be able to display that text exactly
as the author intended. While it's true that the recipient sees what
was intended, it does not seem to be true that actual text is being
sent. Once the material is in PDF format, no further text processing
appears to be possible; the actual text has been lost somewhere along
the way. (ASCII text notwithstanding.)

Without any real knowledge of the PDF format and what happens when
converting a file to PDF, it appears to me that it is not text which is
being embedded. Rather, the process is embedding glyphs. If a glyph
is mapped to a Unicode value, at least some applications can return that
value. But, if the glyph is not mapped to a unicode value (which is
normally the case with presentation forms used in complex scripts),
there does not seem to be any effort made to preserve the Unicode
string which generated the presentation form. And that's really a
shame.

Another shame is telling Tamil users that Unicode won't standardize
a duplicate encoding until a certain event happens. This gives the
misleading impression that there's at least a possibility that Unicode
might encode TACE/TUNE.

It would have been much better, my opinion, to simply have told people
up front that there is absolutely no possibility whatsoever for such a
duplicate encoding in the standard. In which case, the people who have
spent time and effort towards such an encoding could have been doing
something productive with their time and resources instead of wasting
them. Like, for example, solving problems with the PDF format
related to complex scripts.

Best regards,

James Kass

P.S. - There's a special FAQ page for Tamil encoding issues here:
http://unicode.org/faq/tamil.html

Suggested additions to that page might include:

Q: Is there any possibility that a new character encoding scheme for
Tamil which considers ligatures as characters will either be added to
Unicode side-by-side with the existing Unicode Tamil encoding or
replace the current Tamil Unicode encoding model altogether?

A: No.

Next message: Joó Ádám: "Re: Old Hungarian script - progress?"
Previous message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: Bala: "RE: minimizing size (was Re: allocation of Georgian letters)"
Reply: Bala: "RE: minimizing size (was Re: allocation of Georgian letters)"
Reply: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: Eric Muller: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 03:58:24 CST