Re: 28th IUC paper - Tamil Unicode New

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Aug 18 2005 - 20:44:19 CDT

  • Next message: Gregg Reynolds: "Re: FW: Subj: Converting from UCS-2 to UTF-8"

    > See Annex 4 on Tamil Unicode New scheme:
    > http://www.infitt.org/ti2004/www/ed_rept.pdf

    This Annex seems to be making quite a number of rather
    baseless claims about "Unicode Tamil-New".

    Among them:

    "Its simplicity leads to enormous savings. The space
    requirement for a Unicode Tamil-New is about 40% less
    than what is required in the current Unicode Tamil. If
    we calculate the cumulative storage requirement of over
    60 million Tamils, it will be in thousands of crores
    of rupees every month."

    There is no hint regarding how this calculation is to
    be done. But it seems quite clear to me that the future
    cumulative storage requirement of "over 60 million Tamils"
    will be driven by the space required for software installation
    (including video games), images, sound, browser caches,
    memory caches, etc., and not at all by the alleged 40% advantage
    in plain text storage. I see no sign whatsoever, for
    example, that current storage requirements for *English*
    language users are driven in the slightest by the *50%*
    advantage that ASCII storage for English text enjoys over
    UTF-16 Unicode storage of the same text. The plain fact
    is that for all modern computer systems, the storage space
    needed for plain text is the merest drop in the bucket
    compared to the gigabytes needed to store everything else
    that is on people's computers.

    And:

    "The Time and Cost required to communicate Tamil text in the
    Unicode Tamil-New encoding is about 40% less than in the
    current Unicode."

    Again, for any realistic data scenarios, this is just flat
    wrong. Most communication scenarios involve structured
    data, and the structure often outweighs the plain text
    content by a considerable margin.

    Most Tamil text communication on the internet will involve
    HTML, for example, and it is likely that it will be *more* efficient to
    do that directly in Unicode than it will be to do so with
    yet another font hack. The claim regarding communication
    efficiency completely discounts both the time and storage
    *in*efficiencies of having to pass around and install the
    extra fonts to deal with the extra encoding, and/or the need
    to embed more fonts in pdf documents and the like.

    --Ken



    This archive was generated by hypermail 2.1.5 : Thu Aug 18 2005 - 20:46:42 CDT