Re: Compression through normalization

From: Doug Ewell (
Date: Thu Nov 27 2003 - 16:01:44 EST

  • Next message: Philippe Verdy: "RE: numeric properties of Nl characters in the UCD"

    Peter Kirk <peterkirk at qaya dot org> wrote:

    > Yes, the compressor can make any canonically equivalent change, not
    > just composing composition exclusions but reordering combining marks
    > in different classes. The only flaw I see is that the compressor does
    > not have to undo these changes on decompression; at least no other
    > process is allowed to rely on it having done so.

    I agree with Peter here. I don't think the burden should be on the
    decompressor to reverse any operation that the compressor performs,
    except for the compression itself. After all, if we are letting the
    compressor change the normalization form of the input text, the
    decompressor cannot possibly know what the original form was, and is in
    no position to try to re-create it.

    I'm particularly concerned about having the compressor produce any
    normalization form other than NFC or NFD, such as the partial
    normalization Philippe originally described, or (most definitely) any
    form of so-called "normalization" that ignores the composition
    exclusions. The output of the compressor *is* Unicode text; it just
    happens to be in another format. It must follow all the conformance
    rules that normally apply to Unicode text.

    -Doug Ewell
     Fullerton, California

    This archive was generated by hypermail 2.1.5 : Thu Nov 27 2003 - 16:44:38 EST