Re: Proposing UTF-21/24

From: Frank Ellermann (nobody@xyzzy.claranet.de)
Date: Mon Jan 22 2007 - 11:58:01 CST

  • Next message: Asmus Freytag: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

    Doug Ewell wrote:

    > The greatest roadblock to acceptance of SCSU is its *perception* of
    > complexity. It is not nearly as complicated as it is perceived to be,
    > and I say this having implemented both simple and optimized encoders as
    > well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
    > bit more complex

    Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
    I gave up on SCSU. To say that it's horrible would be putting it mildly.

    One of the nice features of BOCU-1, a single error destroys at most one
    line. With UTF-8 a single error destroys at most one code point. Try
    that with SCSU, and its various ways to encode the same piece of text.

    > BOCU-1 is less complex, but more obscure

    Not at all, it's a rather smart application of the 3*7 bits idea discussed
    in this thread, at some point it uses 1114111 = 2**20 + 2**16 -1 as biggest
    possible "jump".

    > an additional problem: its core algorithm is covered under a U.S.
    > patent (6,737,994) owned by IBM. Although they currently offer a
    > royalty-free license, IBM has been known to change their terms of
    > licensing from time to time.

    So far they didn't tell me that my BOCU-1 script needs a license - okay,
    that's no serious objection. IMO nobody needs a special compression for
    Unicode anyway. But in theory BOCU-1 is nice, especially if compared
    with SCSU.

    > memories of the Unisys GIF patent are still too fresh in my mind.

    The LZW patent is expired worldwide now. It was possible to create
    uncompressed GIFs, http://purl.net/xyzzy/pub/clear1x1.gif (45 bytes)
    vs. clearlzw.gif (43 bytes) is an admittedly silly example.

    Frank



    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 12:07:05 CST