Re: Compression rates of text data

From: Mirko Raner (raner@mathema.de)
Date: Mon Jun 23 1997 - 11:57:22 EDT


> From unicode Mon Jun 23 17:05 MET 1997
> Date: Mon, 23 Jun 1997 17:05:38 +0200 (MET DST)
> From: Unicode-Mailempfang <unicode>
> To: raner
> Subject: Compression rates of text data
>
>
> ----- Begin Included Message -----
>
> >From unicode@unicode.org Fri Jun 20 22:56 MET 1997
> Mime-Version: 1.0
> Content-Transfer-Encoding: 7bit
> X-Uml-Sequence: 2957 (1997-06-20 20:22:44 GMT)
> To: Multiple Recipients of <unicode@unicode.org>
> From: "Unicode Discussion" <unicode@unicode.org>
> Date: Fri, 20 Jun 1997 13:22:43 -0700 (PDT)
> Subject: Compression rates of text data
>
>
> Hi folks,
>
> Does anyone have experience or information on the
> compression of text data in Unicode? I would be
> interested to hear if any vendors compress text data
> stored in Unicode and how much space savings they
> have experienced. I would be interesting in hearing
> how various compression routines do with respect to
> Unicode data.

Dear Randy,

there was an article about "Reuters Compression Scheme
for Unicode" (RCSU) in the conference proceedings for
IUC 10 (Proceedings Part 2, Slot B12). There is also a
web document about RCSU somewhere.

The article contains very good statistics about the
results of applying RCSU compression to text documents
in several languages. There are also statistics about
RCSU followed by a secondary LZW compression.

However, at our company (MATHEMA Software GmbH) a new,
more efficient compression scheme is being developed
which we will (hopefully) present at IUC 11. Special
transformations in this compression scheme provide for
optimal compression rates of secondary LZ-based
algorithms.

I know that a Java implementation of RCSU exists, but
unfortunately I didn't receive any replies concerning
this. So, if you - or anyone else - gets hold of
sources or binaries of a RCSU implementation, please
let me know!

Best regards,

Mirko

(raner@mathema.de)

> Thanks in advance.
>
> Randy
> ------------------------------------------------------------------------
> ---------------------------
> Randolph S. Williams
> National Language Support Voice:
> 919.677.8000
> SAS Institute Inc.
> Fax: 919.677.4444
> Cary, NC 27513 USA Email:
> sasrsw@wnt.sas.com
>

Mirko Raner
Software Developer
MATHEMA Software GmbH
Germany



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT