Re: Unicode & space in programming & l10n

From: Doug Ewell (dewell@adelphia.net)
Date: Sun Sep 24 2006 - 17:46:20 CST

Next message: Philippe Verdy: "Re: Question about formatting numerals"

Previous message: Doug Ewell: "Re: Problem with SSI and BOM"
In reply to: John D. Burger: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

John D. Burger <john at mitre dot org> wrote:

> On the notion of analyzing the words in text, sorting by frequency,
> and assigning shorter code units to higher frequency words for
> compression:
>
> This is typically not worth the effort - high-frequency words perforce
> are more likely to occur earlier in the text, and thus are given short
> code words with no such analysis needed. Moreover, not defining what
> a "word" is lets Ziv-Lempel and friends discover subwords and
> multi-word sequences automagically. They essentially do stemming
> without knowing anything about language at all.

This was a special-purpose project that I rolled myself, where
compression happens only once and decompression happens repeatedly, and
where I elected to use a simpler and lighter-weight mechanism than LZ.

> Also remember that compression ratio is not the only figure of merit -
> compression speed is also important.

Point well taken. My impression is that the approach I took, for its
limited purpose, is comparable to LZ in speed, but that's just a guess
since I haven't profiled either one.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/
RFC 4645  *  UTN #14

Next message: Philippe Verdy: "Re: Question about formatting numerals"
Previous message: Doug Ewell: "Re: Problem with SSI and BOM"
In reply to: John D. Burger: "Re: Unicode & space in programming & l10n"
Next in thread: Hans Aberg: "Re: Unicode & space in programming & l10n"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sun Sep 24 2006 - 18:05:49 CST