Reuters Compression Scheme for Unicode (RCSU)

From: Randy Williams (
Date: Tue Jul 01 1997 - 09:46:13 EDT


  Misha suggested I send my questions to the list...

> I was reviewing RCSU paper from the UIC-9 proceedings.
>(I did not attend UIC-10 and do not have those proceedings).
>I was wondering if anyone has any stats on UTF-8 for comparison
>purposes? How does LZW and RCSU do compared to UTF-8 in
>terms of speed?
> Does anyone have any data on the size of UTF-8 vs Unicode? I realize
>that UTF-8 will be 50% in size for characters in the 7-bit ASCII range
>and that Asian scripts with pure DBCS characters will be 150% in
>size for UTF-8. It appears RCSU paper has an idea of typical data,
>so how does that typical data measure up in size with UTF-8?
>I assume that RCSU authors have some idea of "typical data" and thus
>why they were able to conclude that UTF-8 was not good enough for
>their purposes.
> Thanks in advance.

> Randolph S. Williams
>National Language Support Voice:
>SAS Institute Inc. Fax:
> 919.677.4444
>Cary, NC 27513 USA Email:

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:35 EDT