Compression and Unicode [was: Name Compression]

From: Juliusz Chroboczek (
Date: Thu May 11 2000 - 01:20:14 EDT (Torsten Mohrin) writes:

TM> In SC UniPad we use a compressed name table. The names are compressed
TM> by encoding the words either in one or two bytes. The separators
TM> (space and hyphen-minus) are encoded in a special way. It works as
TM> follows:

[explanation snipped]

Why not use Huffman encoding? You could precompute the Huffman tables
once and for all, compile them into your program, and only do the
actual encoding/decoding at runtime.

It would be a little bit more computationally expensive than your
scheme due to the need to access parts of bytes, but would yield a
much better compression ratio.

More generally, I get the impression that the Unicode community is
particularly keen on inventing /ad hoc/ compression schemes. I still
haven't heard a sound rationale for the existence of the SCCS. What's
wrong with patent-free variants of LZW?


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT