Re: UTN #31 and direct compression of code points

From: Doug Ewell (dewell@adelphia.net)
Date: Tue May 08 2007 - 00:00:17 CDT

  • Next message: Peter Constable: "RE: Uppercase ß is coming? (U+1E9E)"

    Richard Wordingham <richard dot wordingham at ntlworld dot com> wrote:

    > The algorithm given is clearly for compressing UTF-16 data. Look at
    > the sign test for three byte difference values. (It could be
    > adjusted/corrected to handle arbitrary codepoint differences.) I
    > wonder if SCSU would out-perform the algorithm on, say, Shavian.

    Shavian can be encoded extremely efficiently in SCSU: only one byte per
    character, plus three bytes of overhead (0B 60 08) at the start of the
    stream to set up a dynamic window, and another (01) to quote each U+00B7
    "namer dot." I doubt the simplified LZ method presented in UTN #31 can
    top this, but of course there's nothing like experimentation.

    --
    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14
    http://users.adelphia.net/~dewell/
    http://www1.ietf.org/html.charters/ltru-charter.html
    http://www.alvestrand.no/mailman/listinfo/ietf-languages
    


    This archive was generated by hypermail 2.1.5 : Tue May 08 2007 - 00:01:35 CDT