Re: Unicode-capable compres

From: MGauthier (MGauthier@iit.nrc.ca)
Date: Fri Oct 21 1994 - 03:55:24 EDT


Subject: RE>>Unicode-capable compress
>I had assumed that traditional compression algorithms looked for repeats
>on an 8-bit basis and, hence, would fail to compress Unicode. Is this
>assumption correct/incorrect?

As was mentioned, Unicode data does compress relatively well with 8 bit
data compressors, though perhaps not quite as well as if the compressor
knew to expect two-byte entities. For a simple example, I took a 115k ASCII
file and converted it to Unicode (prepending nuls to every character),
making it 230k, then compressed both the ASCII and Unicode files with gzip.
The ASCII one came down to about 32k, the Unicode one to 39k (I would have
expected better for the latter though; the compression algorithm could
indeed be a bit smarter for this).

-Marc



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:32 EDT