Re: minimizing size (was Re: allocation of Georgian letters)

From: Doug Ewell ([email protected])
Date: Sat Feb 09 2008 - 13:50:10 CST

Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"

Previous message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

James Kass <thunder dash bird at earthlink dot net> wrote:

>> Except in the presence of bugs such as this, Unicode data can be
>> copied and pasted from one Unicode-aware program to another
>> Unicode-aware program with 100% fidelity, regardless of the encoding
>> model.
>
> (Andrew responds well to reported problems, but how can he fix bugs in
> third-party PDF applications?)

I am pretty sure this is a BabelPad bug, related to the pasting of text
into BabelPad, not the copying of text from PDF.

> The operative phrase is "Unicode-aware application". I believe it
> would possible to copy/paste text back-and-forth between BabelPad and
> Notepad until the mouse wore out without data corruption.

At the risk of dragging an otherwise excellent text editor further
through the mud, and solely in the interest of improving BP, I can try
to produce an example where extra crud is pasted into BP after the
"real" text.

> PDF has long been touted as *the* way to safely send text with the
> assurance that the recipients will be able to display that text
> exactly as the author intended. While it's true that the recipient
> sees what was intended, it does not seem to be true that actual text
> is being sent. Once the material is in PDF format, no further text
> processing appears to be possible; the actual text has been lost
> somewhere along the way. (ASCII text notwithstanding.)

This is an important point: for at least some applications of PDF, the
recipient can display the text exactly as the author intended, but
cannot necessarily do anything else with it.

> Another shame is telling Tamil users that Unicode won't standardize a
> duplicate encoding until a certain event happens. This gives the
> misleading impression that there's at least a possibility that Unicode
> might encode TACE/TUNE.

Indeed, as I have said many times. Regardless of how firm someone may
have actually been in a meeting, the reports and meeting minutes have
consistently indicated that encoding TACE/TUNE in Unicode is a
possibility, which is either misleading to the proponents (if false) or
a complete destabilizing of Tamil in Unicode (if true).

> P.S. - There's a special FAQ page for Tamil encoding issues here:
> http://unicode.org/faq/tamil.html
>
> Suggested additions to that page might include:
>
> Q: Is there any possibility that a new character encoding scheme for
> Tamil which considers ligatures as characters will either be added to
> Unicode side-by-side with the existing Unicode Tamil encoding or
> replace the current Tamil Unicode encoding model altogether?
>
> A: No.

Q: Then how can we map text between the current Tamil Unicode encoding
model and a more "correct" sequence of units that reflects the way Tamil
script users think of their script?

A: By using the named sequences provided in
http://www.unicode.org/Public/5.1.0/ucd/NamedSequencesProv-5.1.0d1.txt.
The use of named sequences is described in UAX #34, "Unicode Named
Character Sequences."

(Note: the "provisional" named sequences for Tamil will probably need to
be upgraded to full approved status before users will take this advice
seriously.)

--
Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
http://www.ewellic.org
http://www1.ietf.org/html.charters/ltru-charter.html
http://www.alvestrand.no/mailman/listinfo/ietf-languages  ˆ

Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
Previous message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 13:52:21 CST