Re: minimizing size (was Re: allocation of Georgian letters)

From: Michael S. Kaplan (michka@trigeminal.com)
Date: Sat Feb 09 2008 - 12:58:09 CST

  • Next message: Doug Ewell: "Re: minimizing size (was Re: allocation of Georgian letters)"

    They technically have been told this on more than one occasion by more than
    one person over the last 7-8 years.

    Some messages are harder to hear than others.

    MichKa

    ----- Original Message -----
    From: "Bala" <bala@cse.mrt.ac.lk>
    To: "'James Kass'" <thunder-bird@earthlink.net>; "'Unicode Mailing List'"
    <unicode@unicode.org>
    Sent: Saturday, February 09, 2008 9:19 AM
    Subject: RE: minimizing size (was Re: allocation of Georgian letters)

    > James Kass----
    > Another shame is telling Tamil users that Unicode won't standardize
    > a duplicate encoding until a certain event happens. This gives the
    > misleading impression that there's at least a possibility that Unicode
    > might encode TACE/TUNE.
    >
    > It would have been much better, my opinion, to simply have told people
    > up front that there is absolutely no possibility whatsoever for such a
    > duplicate encoding in the standard. In which case, the people who have
    > spent time and effort towards such an encoding could have been doing
    > something productive with their time and resources instead of wasting
    > them. Like, for example, solving problems with the PDF format
    > related to complex scripts.
    > -----
    >
    > I attended to the Chennai meeting last month. UTC were very clearly
    > mentioned that dual encoding it's not possible at all in any stage in the
    > meeting.
    > They suggested few other solutions in case if TACE wanted to be used.
    > (Like IANA)
    >
    > Anyway is not mean that Tamil is a complex script. In present Tamil we
    > have a defined set of elements (326) which used to built the text. If you
    > take the Indic languages, from my understanding Sinhala has the more
    > letters and Tamil has the least letters. Except Tamil, other Indic
    > languages does have the combined forms which produce the combined letters
    > and make the language complex. In Sinhala there are few thousands letters
    > can be logically generated. Some of the letters people are not using in
    > the text, but logically there is such letters. However in Tamil there not
    > combined letters concepts at all.
    >
    > In Tamil there is only 1 Conjoint Consonant (ksh) and 1 Conjoint syllable
    > (Shrii) are presently been used in text. These are totally borrowed
    > elements. This why Tamil should not be considered as complex script and
    > expected as Level 1 encoding in Unicode. However Unicode were very clear
    > in the Chennai meeting that dual encoding is not possible and present
    > encoding cannot be deprecated as well.
    >
    >
    > Thank you
    >
    > Kind Regards
    > Bala
    >
    > -----Original Message-----
    > From: unicode-bounce@unicode.org [mailto:unicode-bounce@unicode.org] On
    > Behalf Of James Kass
    > Sent: Saturday, February 09, 2008 3:25 PM
    > To: Unicode Mailing List
    > Subject: Re: minimizing size (was Re: allocation of Georgian letters)
    >
    >
    > Doug Ewell wrote,
    >
    >> As much as I like BabelPad (it has replaced SC UniPad as my favorite
    >> full-service-Unicode editor), I have had serious problems pasting text
    >> into BabelPad from the clipboard. Sometimes there is a large chunk of
    >> random text after the "real" data; there have been other symptoms as
    >> well. I assume Andrew will be able to resolve these when he has a
    >> chance to update the program.
    >>
    >> Except in the presence of bugs such as this, Unicode data can be copied
    >> and pasted from one Unicode-aware program to another Unicode-aware
    >> program with 100% fidelity, regardless of the encoding model.
    >
    > (Andrew responds well to reported problems, but how can he fix bugs
    > in third-party PDF applications?)
    >
    > The operative phrase is "Unicode-aware application". I believe it would
    > possible to copy/paste text back-and-forth between BabelPad and
    > Notepad until the mouse wore out without data corruption.
    >
    > PDF has long been touted as *the* way to safely send text with the
    > assurance that the recipients will be able to display that text exactly
    > as the author intended. While it's true that the recipient sees what
    > was intended, it does not seem to be true that actual text is being
    > sent. Once the material is in PDF format, no further text processing
    > appears to be possible; the actual text has been lost somewhere along
    > the way. (ASCII text notwithstanding.)
    >
    > Without any real knowledge of the PDF format and what happens when
    > converting a file to PDF, it appears to me that it is not text which is
    > being embedded. Rather, the process is embedding glyphs. If a glyph
    > is mapped to a Unicode value, at least some applications can return that
    > value. But, if the glyph is not mapped to a unicode value (which is
    > normally the case with presentation forms used in complex scripts),
    > there does not seem to be any effort made to preserve the Unicode
    > string which generated the presentation form. And that's really a
    > shame.
    >
    > Another shame is telling Tamil users that Unicode won't standardize
    > a duplicate encoding until a certain event happens. This gives the
    > misleading impression that there's at least a possibility that Unicode
    > might encode TACE/TUNE.
    >
    > It would have been much better, my opinion, to simply have told people
    > up front that there is absolutely no possibility whatsoever for such a
    > duplicate encoding in the standard. In which case, the people who have
    > spent time and effort towards such an encoding could have been doing
    > something productive with their time and resources instead of wasting
    > them. Like, for example, solving problems with the PDF format
    > related to complex scripts.
    >
    > Best regards,
    >
    > James Kass
    >
    > P.S. - There's a special FAQ page for Tamil encoding issues here:
    > http://unicode.org/faq/tamil.html
    >
    > Suggested additions to that page might include:
    >
    > Q: Is there any possibility that a new character encoding scheme for
    > Tamil which considers ligatures as characters will either be added to
    > Unicode side-by-side with the existing Unicode Tamil encoding or
    > replace the current Tamil Unicode encoding model altogether?
    >
    > A: No.
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 13:01:12 CST