Re: Unicode & space in programming & l10n

From: John D. Burger (
Date: Fri Sep 22 2006 - 21:28:06 CDT

  • Next message: Philippe Verdy: "Re: Problem with SSI and BOM"

    On the notion of analyzing the words in text, sorting by frequency,
    and assigning shorter code units to higher frequency words for

    This is typically not worth the effort - high-frequency words
    perforce are more likely to occur earlier in the text, and thus are
    given short code words with no such analysis needed. Moreover, not
    defining what a "word" is lets Ziv-Lempel and friends discover
    subwords and multi-word sequences automagically. They essentially do
    stemming without knowing anything about language at all.

    Also remember that compression ratio is not the only figure of merit
    - compression speed is also important.

    - John Burger

    This archive was generated by hypermail 2.1.5 : Fri Sep 22 2006 - 21:33:24 CDT