RE: Compression through normalization

From: Arcane Jill (
Date: Wed Nov 26 2003 - 06:15:53 EST

  • Next message: Philippe Verdy: "RE: numeric properties of Nl characters in the UCD"

    In the case of GIF versus JPG, which are usually regarded as "lossless"
    versus "lossy", please note that there /is/ no "orignal", in the sense
    of a stream of bytes. Why not? Because an image is not a stream of
    bytes. Period. What is being compressed here is a rectangular array of
    pixels, and that is what is being restored when the image is "viewed". I
    am not aware of ANY use of the GIF format to compress an arbitrary byte

    So, by analogy, if the XYZ compression format (I made that up) claims to
    compress a sequence of Unicode glyphs, as opposed to an arbitrary byte
    stream, and can later reconstruct that sequence of glyphs exactly, then
    I argue that it has every right to be called "lossless", in the same
    manner that GIF is called "lossless", because /there is no original byte
    stream to preserve/.


    > -----Original Message-----
    > From: Doug Ewell []
    > Sent: Tuesday, November 25, 2003 7:09 PM
    > To: Unicode Mailing List; UnicoRe Mailing List
    > Subject: Re: Compression through normalization
    > Here's a summary of the responses so far:
    > * Philippe Verdy and and Jill Ramonsky say YES, a compressor can
    > normalize, because it knows it is operating on Unicode character data
    > and can take advantage of Unicode properties.
    > * Peter Kirk and Mark Shoulson say NO, it can't, because all the
    > compressor really knows about is the byte stream, so it must be
    > preserved byte-for-byte.
    > * I'm still not sure, but I'm leaning toward NO.

    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 07:00:12 EST