Hans Aberg
Thu Sep 21 2006

    On 21 Sep 2006, at 08:13, Asmus Freytag wrote:

    > If you assume a large alphabet, then your compression gets worse,
    > even if the actual number of elements are few.

    So why would that be? - In one compression method, one just makes a
    frequency analysis on the characters used, and encodes based on that.
    So table entries need only be for characters actually used.

    One way to do a character compression is to simply do a frequency
    analysis, sort the characters according to that, which gives a map
    code points -> code points. Then apply a variable width character
    encoding which gives smaller width to smaller non-negative integers,
    like say UTF-8, to that. Here, the compression method cannot do worse
    than UTF-8.

       Hans Aberg

