RE: Compression through normalization

From: Philippe Verdy (
Date: Tue Nov 25 2003 - 17:35:15 EST

  • Next message: Philippe Verdy: "RE: Compression through normalization"

    Doug Ewell writes:
    > * Philippe Verdy and and Jill Ramonsky say YES, a compressor can
    > normalize, because it knows it is operating on Unicode character data
    > and can take advantage of Unicode properties.

    I say YES only for compressors that are supposed to work on Unicode text
    (this applies to BOCU-1 and SCSU which are not intented to compress anything
    else than Unicode text), but NO of course for general purpose compressors
    (like deflate in zip files.)

    I will say NO for encoding forms that are normally built to be directly
    parsable code point by codepoint in any direction and from random locations
    in strings. So a UTF encoding scheme is not supposed to change the
    normalization form.

    > * Peter Kirk and Mark Shoulson say NO, it can't, because all the
    > compressor really knows about is the byte stream, so it must be
    > preserved byte-for-byte.

    But SCSU and BOCU-1 do not operate in the byte stream level, as their use is
    invalid on random streams of bytes, but only defined in terms of streams of
    code units... That's why I won't say that SCSU and BOCU-1 are really
    compressors, but rather really encoding schemes (CES in the ISO10646

    In fact the result of BOCU-1 and SCSU encoding schemes can create a file
    which has its own charset (i.e. CCS+CES in the ISO terminology), and thus
    can also have its own label for MIME usage or in XML charset declarations.
    This is not a limitation, as true compressors can still be used if needed
    from this encoding scheme, or transparently within transport layers (such as
    the "Content-Transfer-Encoding:" in MIME and HTTP applications).

    > * I'm still not sure, but I'm leaning toward NO.

    << ella for Spam Control >> has removed Spam messages and set aside
    Newsletters for me
    You can use it too - and it's FREE!

    This archive was generated by hypermail 2.1.5 : Tue Nov 25 2003 - 18:31:44 EST