Re: SCSU implementations

From: Doug Ewell (dewell@compuserve.com)
Date: Tue Apr 11 2000 - 10:49:06 EDT


Asmus Freytag <asmusf@ix.netcom.com> wrote:

>> By using all eight windows/registers, I could sometimes beat the Java
>> reference code in terms of output SCSU when the source text used more
>> than one range of Unicode outside of the predefined ranges.
>
> I would love to learn of a realistic scenario where this situation
> occurs. It's trivially possible to generate test strings that work to
> the strenghts (or design features) of a given compressor, but the task
> is to create one that works well with real-life text.

Well, Adrian did say "sometimes."

I just added limited use of static windows to my encoder and found that
it encodes the Japanese example in the TR in 181 bytes, which is not as
good as the 178 shown in the TR, but BETTER than the 182 that the "more
details" page claims for the reference encoder. (Is this a problem in
the report, that these two references disagree?) As far as I can tell,
I have not optimized my encoder for Japanese text to any great extent.

Naturally, I was delighted to see that I had achieved such good
compression compared to the amount of work I did.

I have also tested my code with Frank da Cruz's UTF-8 test page and
with the "Tenth International Unicode Conference" UTF-8 page on the
Unicode Web site, but since I have not compiled the reference encoder
(no Java compiler at hand), I can't compare my encoder's performance
against it.

-Doug Ewell
 Fullerton, California



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:01 EDT