Re: Unicode-capable compres

From: MGauthier (MGauthier@iit.nrc.ca)
Date: Fri Oct 21 1994 - 03:55:24 EDT

Next message: Wayne Pollock: "Re: Unicode-capable compression software"
Previous message: John R. Bennett: "Re: Unicode-capable compression software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Subject: RE>>Unicode-capable compress
>I had assumed that traditional compression algorithms looked for repeats
>on an 8-bit basis and, hence, would fail to compress Unicode. Is this
>assumption correct/incorrect?

As was mentioned, Unicode data does compress relatively well with 8 bit
data compressors, though perhaps not quite as well as if the compressor
knew to expect two-byte entities. For a simple example, I took a 115k ASCII
file and converted it to Unicode (prepending nuls to every character),
making it 230k, then compressed both the ASCII and Unicode files with gzip.
The ASCII one came down to about 32k, the Unicode one to 39k (I would have
expected better for the latter though; the compression algorithm could
indeed be a bit smarter for this).

-Marc

Next message: Wayne Pollock: "Re: Unicode-capable compression software"
Previous message: John R. Bennett: "Re: Unicode-capable compression software"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:30 EDT