Re: Proposing UTF-21/24

From: Frank Ellermann (
Date: Mon Jan 22 2007 - 11:58:01 CST

  • Next message: Asmus Freytag: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

    Doug Ewell wrote:

    > The greatest roadblock to acceptance of SCSU is its *perception* of
    > complexity. It is not nearly as complicated as it is perceived to be,
    > and I say this having implemented both simple and optimized encoders as
    > well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
    > bit more complex

    Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so far
    I gave up on SCSU. To say that it's horrible would be putting it mildly.

    One of the nice features of BOCU-1, a single error destroys at most one
    line. With UTF-8 a single error destroys at most one code point. Try
    that with SCSU, and its various ways to encode the same piece of text.

    > BOCU-1 is less complex, but more obscure

    Not at all, it's a rather smart application of the 3*7 bits idea discussed
    in this thread, at some point it uses 1114111 = 2**20 + 2**16 -1 as biggest
    possible "jump".

    > an additional problem: its core algorithm is covered under a U.S.
    > patent (6,737,994) owned by IBM. Although they currently offer a
    > royalty-free license, IBM has been known to change their terms of
    > licensing from time to time.

    So far they didn't tell me that my BOCU-1 script needs a license - okay,
    that's no serious objection. IMO nobody needs a special compression for
    Unicode anyway. But in theory BOCU-1 is nice, especially if compared
    with SCSU.

    > memories of the Unisys GIF patent are still too fresh in my mind.

    The LZW patent is expired worldwide now. It was possible to create
    uncompressed GIFs, (45 bytes)
    vs. clearlzw.gif (43 bytes) is an admittedly silly example.


    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 12:07:05 CST