Re: Data compression

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed May 04 2005 - 16:37:52 CDT

  • Next message: Philippe Verdy: "Re: Data compression"

    From: "N. Ganesan" <naa.ganesan@gmail.com>
    > May be if engineers here work with Arvind Thiagarajan,
    > a Tamil engineer, now in Singapore, they can compress
    > Unicode few orders of magnitude higher ?!
    >
    > http://in.rediff.com/money/2005/may/04spec.htm

    Lossless image compression (as focused in this article for medical imagery),
    is completely out of topic here. The technics used to compress images are
    completely unrelated to those used to compress text (even if Unicode needs
    losless compression). So whatever technics he uses on images, it certainly
    uses 2D gradient properties and representation of those gradients with a
    probabilist but lossless encoding technic. It cannot be used to compress
    Unicode text.

    Also, an image is by itself a self-contained object, and noone really needs
    a direct access to the value of an individual pixel. This is not the case
    for text, where one frequently needs to enumerate the abstract characters
    (code points) that make up a string.

    Tamil compresses very well for example with SCSU (with nearly one encoded
    byte per codepoint). You could achieve better compression by compressing the
    whole text with common dictionnary-based compressors like Lempel-Ziv, but
    you'll get difficulties to enumerate or accessing the codepoint values at
    random position in the text.

    There is absolutely NOTHING in this article about text compression. So this
    is useless and out of topic here.



    This archive was generated by hypermail 2.1.5 : Wed May 04 2005 - 16:39:11 CDT