RE: minimizing size (was Re: allocation of Georgian letters)

From: Bala (bala@cse.mrt.ac.lk)
Date: Sat Feb 09 2008 - 11:19:14 CST

Next message: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"

Previous message: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

James Kass----
Another shame is telling Tamil users that Unicode won't standardize
a duplicate encoding until a certain event happens. This gives the
misleading impression that there's at least a possibility that Unicode
might encode TACE/TUNE.

It would have been much better, my opinion, to simply have told people
up front that there is absolutely no possibility whatsoever for such a
duplicate encoding in the standard. In which case, the people who have
spent time and effort towards such an encoding could have been doing
something productive with their time and resources instead of wasting
them. Like, for example, solving problems with the PDF format
related to complex scripts.
-----

I attended to the Chennai meeting last month. UTC were very clearly mentioned that dual encoding it's not possible at all in any stage in the meeting.
They suggested few other solutions in case if TACE wanted to be used. (Like IANA)

Anyway is not mean that Tamil is a complex script. In present Tamil we have a defined set of elements (326) which used to built the text. If you take the Indic languages, from my understanding Sinhala has the more letters and Tamil has the least letters. Except Tamil, other Indic languages does have the combined forms which produce the combined letters and make the language complex. In Sinhala there are few thousands letters can be logically generated. Some of the letters people are not using in the text, but logically there is such letters. However in Tamil there not combined letters concepts at all.

In Tamil there is only 1 Conjoint Consonant (ksh) and 1 Conjoint syllable (Shrii) are presently been used in text. These are totally borrowed elements. This why Tamil should not be considered as complex script and expected as Level 1 encoding in Unicode. However Unicode were very clear in the Chennai meeting that dual encoding is not possible and present encoding cannot be deprecated as well.

Thank you

Kind Regards
Bala

-----Original Message-----
From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On Behalf Of James Kass
Sent: Saturday, February 09, 2008 3:25 PM
To: Unicode Mailing List
Subject: Re: minimizing size (was Re: allocation of Georgian letters)

Doug Ewell wrote,

> As much as I like BabelPad (it has replaced SC UniPad as my favorite
> full-service-Unicode editor), I have had serious problems pasting text
> into BabelPad from the clipboard. Sometimes there is a large chunk of
> random text after the "real" data; there have been other symptoms as
> well. I assume Andrew will be able to resolve these when he has a
> chance to update the program.
>
> Except in the presence of bugs such as this, Unicode data can be copied
> and pasted from one Unicode-aware program to another Unicode-aware
> program with 100% fidelity, regardless of the encoding model.

(Andrew responds well to reported problems, but how can he fix bugs
in third-party PDF applications?)

The operative phrase is "Unicode-aware application". I believe it would
possible to copy/paste text back-and-forth between BabelPad and
Notepad until the mouse wore out without data corruption.

PDF has long been touted as *the* way to safely send text with the
assurance that the recipients will be able to display that text exactly
as the author intended. While it's true that the recipient sees what
was intended, it does not seem to be true that actual text is being
sent. Once the material is in PDF format, no further text processing
appears to be possible; the actual text has been lost somewhere along
the way. (ASCII text notwithstanding.)

Without any real knowledge of the PDF format and what happens when
converting a file to PDF, it appears to me that it is not text which is
being embedded. Rather, the process is embedding glyphs. If a glyph
is mapped to a Unicode value, at least some applications can return that
value. But, if the glyph is not mapped to a unicode value (which is
normally the case with presentation forms used in complex scripts),
there does not seem to be any effort made to preserve the Unicode
string which generated the presentation form. And that's really a
shame.

Another shame is telling Tamil users that Unicode won't standardize
a duplicate encoding until a certain event happens. This gives the
misleading impression that there's at least a possibility that Unicode
might encode TACE/TUNE.

Best regards,

James Kass

P.S. - There's a special FAQ page for Tamil encoding issues here:
http://unicode.org/faq/tamil.html

Suggested additions to that page might include:

Q: Is there any possibility that a new character encoding scheme for
Tamil which considers ligatures as characters will either be added to
Unicode side-by-side with the existing Unicode Tamil encoding or
replace the current Tamil Unicode encoding model altogether?

A: No.

Next message: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"
Previous message: Sinnathurai Srivas: "Re: minimizing size (was Re: allocation of Georgian letters)"
In reply to: James Kass: "Re: minimizing size (was Re: allocation of Georgian letters)"
Next in thread: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
Reply: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 11:22:39 CST