Re: Compression through normalization

From: Peter Kirk (
Date: Wed Nov 26 2003 - 07:09:57 EST

  • Next message: Peter Kirk: "Re: Definitions"

    On 25/11/2003 16:38, Doug Ewell wrote:

    >Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
    >>So SCSU and BOCU-* formats are NOT general purpose compressors. As
    >>they are defined only in terms of stream of Unicode code points, they
    >>are assumed to follow the conformance clauses of Unicode. As they
    >>recognize their input as Unicode text, they can recognize canonical
    >>equivalence, and thus this creates an opportunity for them to consider
    >>if a (de)normalization or de/re-composition would result in higher
    >>compression (interestingly, the composition exclusion could be
    >>reconsidered in the case of BOCU-1 and SCSU compressed streams,
    >>provided that the decompression to code points will redecompose the
    >>excluded compositions).
    >I have to say, if there's a flaw in Philippe's logic here, I don't see
    >it. Anyone?
    >-Doug Ewell
    > Fullerton, California
    Yes, the compressor can make any canonically equivalent change, not just
    composing composition exclusions but reordering combining marks in
    different classes. The only flaw I see is that the compressor does not
    have to undo these changes on decompression; at least no other process
    is allowed to rely on it having done so.

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Wed Nov 26 2003 - 08:01:13 EST