Re: Unicode, SMS and year 2012

From: Richard Wordingham <richard.wordingham_at_ntlworld.com>
Date: Sat, 28 Apr 2012 18:55:00 +0100

On Fri, 27 Apr 2012 11:21:05 -0700
"Doug Ewell" <doug_at_ewellic.org> wrote:

> SCSU works equally well, or almost so, with any text sample where the
> non-ASCII characters fit into a single block of 128 code points. For
> anything other than Latin-1 you need one byte of overhead, to switch
> to another window, and for many scripts you need two, to define a
> window and switch to it. But again, two bytes is not what's holding
> anyone up.

With SCSU that avoids Unicode mode and UQU whenever possible, most
alphabetic languages work fairly well. However, extra windows are
needed to cover the half-blocks from A480 to ABFF, 15 new codes. If I
were being miserly, I wouldn't cover A500-A5FF.

SCSU doesn't work well with large syllabaries, especially if they
include a lot of unused characters within the half-blocks used. Inuit
suffers badly from this, but still achieves noticeable compression. I
experimented with compressing Yi transposed to a covered range, and
found that it achieved something like 10% compression. Yi suffers from
needing the 8 dynamic windows to be switched between 10 half-blocks
(with occasionally excursions to an 11th.) If the Yi characters had
been arranged by tone first and initial consonant second, 2 of the
half-blocks would never have been used in my sample!

Vai A500-A63F fits in 3 half-blocks, and I would expect non-Vai
characters in it to be in static blocks. Given how well Yi performed, I
expect Vai to benefit from SCSU.

Has anyone investigated the performance of SCSU with Cuneiform or
Egyptian Hieroglyphics? It might achieve better than 50% compression!
A fair comparison of Egyptian Hieroglyphics depends on the mark-up
used, for Unicode on its own does not enable one to write reasonable Middle
Egyptian.

Richard.
Received on Sat Apr 28 2012 - 13:00:51 CDT

This archive was generated by hypermail 2.2.0 : Sat Apr 28 2012 - 13:00:54 CDT