Re: Compression through normalization

From: Doug Ewell (dewell@adelphia.net)
Date: Tue Nov 25 2003 - 19:38:03 EST

  • Next message: Doug Ewell: "Re: What is a process?"

    Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:

    > So SCSU and BOCU-* formats are NOT general purpose compressors. As
    > they are defined only in terms of stream of Unicode code points, they
    > are assumed to follow the conformance clauses of Unicode. As they
    > recognize their input as Unicode text, they can recognize canonical
    > equivalence, and thus this creates an opportunity for them to consider
    > if a (de)normalization or de/re-composition would result in higher
    > compression (interestingly, the composition exclusion could be
    > reconsidered in the case of BOCU-1 and SCSU compressed streams,
    > provided that the decompression to code points will redecompose the
    > excluded compositions).

    I have to say, if there's a flaw in Philippe's logic here, I don't see
    it. Anyone?

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 20:14:20 EST