Re: BOCU patent (was: Re: Medievalist ligature character in the PUA)

From: Doug Ewell (
Date: Fri Dec 18 2009 - 08:39:28 CST

  • Next message: verdy_p: "re: Is there a Japanese character for the word Unicode? (from Re: Unicode Haiku Contest)"

    "verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

    > Separate ranges has a benefit: it allows fast text search algorithms
    > to work reliably as it allows easy resynchronisation from random
    > positions.

    It is a fundamental feature of UTF-8 and UTF-16. I don't remember
    seeing a claim about separate ranges in the BOCU patent, but one would
    think an attempt to claim that as an innovation would be untenable.

    > I did not know that HTML5 *forbidded* supporting some MIME-registered
    > charsets.
    > Do you mean instead that it forbids recognizing automatically when the
    > charset is unknown (not specified by the resource server, and not
    > specified with the source link) and must be guessed from the bytes
    > content of the stream ?

    From :

    "User agents must not support the CESU-8, UTF-7, BOCU-1 and SCSU

    Amazing, isn't it? So thoughtful of the HTML 5 WG to protect
    developers' time by prohibiting a handful of selected encodings. I can
    support Fieldata or PTTC/EBCD in my user agent if I want to, but not
    UTF-7 or SCSU.

    > You don't have to use ICU actually. ICU components can be fully
    > isolated and rewritten in any other language. But you have to include
    > its licence as your new work will be a derived work based on a
    > copyrighted work, even if it does not use any piece of its source
    > code.

    Right. So suppose I want to implement BOCU-1 from scratch, possibly in
    an attempt to speed up encoding or decoding? Can't do it without asking
    IBM for a license. (Note that I haven't actually looked at the ICU code
    to see if it is already optimally fast. You get the point.)

    > Almost all softwares today include several copyright notices

    I'm not interested, for the moment, in the copyright notices attached to
    software or libraries or other development tools. BOCU-1 is a
    compression encoding, a relatively straightforward way (compared to gzip
    and such) to represent Unicode characters as a sequence of bytes,
    similar to UTF-8 and -7 and -16 and -32 and SCSU and
    ASCII-with-XML-entities and all the rest. But only BOCU-1 among these
    requires me to even think about licenses.

    > For this reason, I don't consider the ICU licence intrusive and
    > blocking, and BOCU-1 as provided through ICU, is both a free (FSF
    > definition) and open (OSI definition) software which does not restrict
    > rewriting it completely.

    I haven't read the ICU license thoroughly, but I'd be surprised if
    "rewriting it completely" is allowed.

    Doug Ewell  |  Thornton, Colorado, USA  |
    RFC 5645, 4645, UTN #14  |  ietf-languages @ ­

    This archive was generated by hypermail 2.1.5 : Fri Dec 18 2009 - 08:40:45 CST