Re: Unicode & space in programming & l10n

From: Hans Aberg (haberg@math.su.se)
Date: Thu Sep 21 2006 - 06:28:25 CDT

  • Next message: Hans Aberg: "Re: Unicode & space in programming & l10n"

    On 21 Sep 2006, at 04:03, Steve Summit wrote:

    >> But this is all academic -- I don't see anyone taking the time
    >> and effort to investigate it in the absence of a compelling need.
    >
    > It would be pretty easy to do a small but convincing demonstration,
    > I think -- I've half a mind to do it right now.

    One can take a UTF-32 file and compress it with 32-bit words and
    about 2^17 table entries, and compare with regular 8 bit word
    compression. Or take a file with UTF-16 with characters within the
    2^16 range, and use 2^16 entry tables, again compare with the regular
    8 bit word compression.

    The (natural language) word compression might do considerably better.

       Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 06:30:46 CDT