RE: Compression through normalization

From: D. Starner (
Date: Wed Nov 26 2003 - 08:05:21 EST

  • Next message: Michael Everson: "Re: numeric properties of Nl characters in the UCD"

    > In the case of GIF versus JPG, which are usually regarded as "lossless"
    > versus "lossy", please note that there /is/ no "orignal", in the sense
    > of a stream of bytes. Why not? Because an image is not a stream of
    > bytes. Period.

    GIF isn't a compression scheme; it uses the LZW compression scheme, like
    Unix compress, which is a stream of bytes compressor. Also, if I take my
    data and encoded it as bytes and stick it into a GIF file with an arbitrary
    palette, I can get back exactly that data. But if I encode my data as 9 bit
    chunks and interprete those as Unicode character points (9 bits, because
    10 bits would get us undefined code points and 16 would get us surrogate
    code points), and I emailed it to someone, and the mailer automatically
    compressed it, I wouldn't consider it lossless if it wouldn't decompress
    at the other side. And enough stuff in the real world will barf on combining
    characters, or at least perform suboptimally, that changing the normalization
    scheme could really cause problems.

    Sign-up for Ads Free at

    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 08:43:33 EST