Re: Unicode & space in programming & l10n

From: Hans Aberg (haberg@math.su.se)
Date: Thu Sep 21 2006 - 06:45:53 CDT

Next message: Doug Ewell: "Re: Unicode & space in programming & l10n"

Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Asmus Freytag: "Re: Unicode & space in programming & l10n"
Next in thread: Doug Ewell: "Re: Unicode & space in programming & l10n"
Reply: Doug Ewell: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 21 Sep 2006, at 08:13, Asmus Freytag wrote:

> If you assume a large alphabet, then your compression gets worse,
> even if the actual number of elements are few.

So why would that be? - In one compression method, one just makes a
frequency analysis on the characters used, and encodes based on that.
So table entries need only be for characters actually used.

One way to do a character compression is to simply do a frequency
analysis, sort the characters according to that, which gives a map
code points -> code points. Then apply a variable width character
encoding which gives smaller width to smaller non-negative integers,
like say UTF-8, to that. Here, the compression method cannot do worse
than UTF-8.

Hans Aberg

Next message: Doug Ewell: "Re: Unicode & space in programming & l10n"
Previous message: Hans Aberg: "Re: Unicode & space in programming & l10n"
In reply to: Asmus Freytag: "Re: Unicode & space in programming & l10n"
Next in thread: Doug Ewell: "Re: Unicode & space in programming & l10n"
Reply: Doug Ewell: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 21 2006 - 06:48:26 CDT