Re: 28th IUC paper - Tamil Unicode New

From: Kenneth Whistler (kenw@sybase.com)
Date: Thu Aug 18 2005 - 20:44:19 CDT

Next message: Gregg Reynolds: "Re: FW: Subj: Converting from UCS-2 to UTF-8"

Previous message: Dean Harding: "RE: FW: Subj: Converting from UCS-2 to UTF-8"
Maybe in reply to: N. Ganesan: "28th IUC paper - Tamil Unicode New"
Next in thread: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> See Annex 4 on Tamil Unicode New scheme:
> http://www.infitt.org/ti2004/www/ed_rept.pdf

This Annex seems to be making quite a number of rather
baseless claims about "Unicode Tamil-New".

Among them:

"Its simplicity leads to enormous savings. The space
requirement for a Unicode Tamil-New is about 40% less
than what is required in the current Unicode Tamil. If
we calculate the cumulative storage requirement of over
60 million Tamils, it will be in thousands of crores
of rupees every month."

There is no hint regarding how this calculation is to
be done. But it seems quite clear to me that the future
cumulative storage requirement of "over 60 million Tamils"
will be driven by the space required for software installation
(including video games), images, sound, browser caches,
memory caches, etc., and not at all by the alleged 40% advantage
in plain text storage. I see no sign whatsoever, for
example, that current storage requirements for *English*
language users are driven in the slightest by the *50%*
advantage that ASCII storage for English text enjoys over
UTF-16 Unicode storage of the same text. The plain fact
is that for all modern computer systems, the storage space
needed for plain text is the merest drop in the bucket
compared to the gigabytes needed to store everything else
that is on people's computers.

And:

"The Time and Cost required to communicate Tamil text in the
Unicode Tamil-New encoding is about 40% less than in the
current Unicode."

Again, for any realistic data scenarios, this is just flat
wrong. Most communication scenarios involve structured
data, and the structure often outweighs the plain text
content by a considerable margin.

Most Tamil text communication on the internet will involve
HTML, for example, and it is likely that it will be *more* efficient to
do that directly in Unicode than it will be to do so with
yet another font hack. The claim regarding communication
efficiency completely discounts both the time and storage
*in*efficiencies of having to pass around and install the
extra fonts to deal with the extra encoding, and/or the need
to embed more fonts in pdf documents and the like.

--Ken

Next message: Gregg Reynolds: "Re: FW: Subj: Converting from UCS-2 to UTF-8"
Previous message: Dean Harding: "RE: FW: Subj: Converting from UCS-2 to UTF-8"
Maybe in reply to: N. Ganesan: "28th IUC paper - Tamil Unicode New"
Next in thread: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Reply: Richard Wordingham: "Re: 28th IUC paper - Tamil Unicode New"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Aug 18 2005 - 20:46:42 CDT