From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Sat May 07 2005 - 20:37:42 CDT
At 10:36 AM 5/7/2005, Doug Ewell wrote:
>It is possible to build an "OK" compressor or a "really good" compressor
>within the same spec. This is also true for some types of non-text
>compression.
Unlike other compression schemes, the performance of a truly sophisticated
compressor and a basic compressor are often very close, unless the basic
compressor is written to be intentionally 'stupid'.
The main reason for allowing different compressors was to make sure that
certain types of strings, for example in Japanese, could utilize compressors
that were optimized for the particular mix of scripts found in that language.
For a scenario that's exclusively Japanese, the 10% (or so) improvement that
a more optimized compressor might yield, could be important.
The UTS comes with two sets of sample code. One in Java, and one in C, the
former implements a middle-of-the-road compression strategy, where some
optimization is attempted, but at the same time complexity of the code
is kept reasonable; the latter presents a very minimal, yet useful encoder.
See http://www.unicode.org/Public/PROGRAMS/ for the source code.
The difference in the performance of these two encoders would probably
not matter, except for really high-volume usage for certain types of
strings or languages.
The main reason this is so, is because the fundamental compression model
is the same, the difference is in the lookahead, and use of some optional
features. This is similar to the task of optimizing program code for
speed. Tweaking the code tends to yield improvements in the few percent
here or there - change to a fundamental algorithm is what really improves
things.
A./
This archive was generated by hypermail 2.1.5 : Sat May 07 2005 - 20:39:42 CDT