From: Doug Ewell (dewell@adelphia.net)
Date: Sat May 07 2005 - 18:10:39 CDT
Peter Kirk <peterkirk at qaya dot org> wrote:
>> All text compression schemes must be lossless.
>
> I would suppose that a text compression scheme which treated
> canonically equivalent sequences as identical (and made use of that
> for slightly improved compression) would be acceptable, although
> technically (at least at the byte level) not lossless.
We had an interesting discussion about this on the list while I was
finishing up UTN #14. (See the section titled "Compression through
normalization.") It turned out not to be completely obvious whether
converting the input to a different normalization form constitutes
"changing" it. There are some reasons why it might be undesirable for a
compression process to change the exact code points, and for that
reason, it probably should not be done unless there is a prior agreement
in place.
-- Doug Ewell Fullerton, California http://users.adelphia.net/~dewell/
This archive was generated by hypermail 2.1.5 : Sat May 07 2005 - 18:12:44 CDT