Re: Proposing UTF-21/24

From: Doug Ewell (
Date: Tue Jan 23 2007 - 00:53:38 CST

  • Next message: Doug Ewell: "Re: Proposing a DOUBLE HYPHEN punctuation mark"

    Frank Ellermann <nobody at xyzzy dot claranet dot de> wrote:

    > Wait a moment, I've implemented MD5, UTF-1, UTF-7, and BOCU-1, and so
    > far I gave up on SCSU. To say that it's horrible would be putting it
    > mildly.

    Before you give up for good, try reading the Appendix of UTN #14.

    > One of the nice features of BOCU-1, a single error destroys at most
    > one line. With UTF-8 a single error destroys at most one code point.
    > Try that with SCSU, and its various ways to encode the same piece of
    > text.

    I do not disagree; SCSU is very stateful.

    >> BOCU-1 is less complex, but more obscure
    > Not at all, it's a rather smart application of the 3*7 bits idea
    > discussed in this thread, at some point it uses 1114111 = 2**20 +
    > 2**16 -1 as biggest possible "jump".

    By "obscure" I meant "unknown to most people."

    It is possible (but invalid) in BOCU-1 to jump farther than ±0x10FFFF,
    not to mention jumps of much shorter distances that would land outside
    the valid range. In addition, jumps of greater than +0x2DD0C
    or –0x2DD0D are four-byte sequences. I think you may be thinking of
    some other algorithm.

    > So far they didn't tell me that my BOCU-1 script needs a license -
    > okay, that's no serious objection. IMO nobody needs a special
    > compression for Unicode anyway. But in theory BOCU-1 is nice,
    > especially if compared with SCSU.

    The letter from IBM reproduced in PDUTS #40 says: "IBM would like to
    bring to your attention, US Patent 6737994 'Binary-Ordered Compression
    For Unicode', which may contain claims necessary to, or which may
    facilitate the implementation of, BOCU-1." "Necessary to... the
    implementation of" means you cannot implement BOCU without infringing on
    IBM's patent, unless IBM has granted you a license. IBM is known for
    enforcing their IP patents, either sooner or later. I tried for months
    to obtain a developer-friendly clarification of this restriction --
    something akin to the "freely available" clause in UTR #16
    (UTF-EBCDIC) -- and was utterly unable to do so.

    >> memories of the Unisys GIF patent are still too fresh in my mind.
    > The LZW patent is expired worldwide now. It was possible to create
    > uncompressed GIFs, (45 bytes)
    > vs. clearlzw.gif (43 bytes) is an admittedly silly example.

    That's why I said the *memories* were fresh. When the BOCU patent
    expires, I might consider dusting off my (fully compliant) encoder and

    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Tue Jan 23 2007 - 00:55:37 CST