email@example.com (Torsten Mohrin) writes:
TM> In SC UniPad we use a compressed name table. The names are compressed
TM> by encoding the words either in one or two bytes. The separators
TM> (space and hyphen-minus) are encoded in a special way. It works as
Why not use Huffman encoding? You could precompute the Huffman tables
once and for all, compile them into your program, and only do the
actual encoding/decoding at runtime.
It would be a little bit more computationally expensive than your
scheme due to the need to access parts of bytes, but would yield a
much better compression ratio.
More generally, I get the impression that the Unicode community is
particularly keen on inventing /ad hoc/ compression schemes. I still
haven't heard a sound rationale for the existence of the SCCS. What's
wrong with patent-free variants of LZW?
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:02 EDT