Re: Unicode & space in programming & l10n

From: Hans Aberg (haberg@math.su.se)
Date: Thu Sep 21 2006 - 09:40:02 CDT

  • Next message: Jon Hanna: "Re: Unicode & space in programming & l10n"

    On 21 Sep 2006, at 14:26, Doug Ewell wrote:

    > Don't forget you need to store the frequency table along with the
    > compressed data, so the reader can reconstruct the table. That
    > could mitigate your compression somewhat.

    In order to keep down the table size, one can do a hybrid, by giving
    compression numbers to the N most frequent code points; then compress
    the rest by applying some character encoding, and use a byte
    compression scheme. The number N will then depend on how large the
    text body is: if it is sufficiently large, all code points might be
    given compression numbers.

       Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 09:42:10 CDT