BOCU patent (was: Re: Medievalist ligature character in the PUA)

From: Doug Ewell (doug@ewellic.org)
Date: Thu Dec 17 2009 - 23:23:43 CST

  • Next message: William_J_G Overington: "Is there a Japanese character for the word Unicode? (from Re: Unicode Haiku Contest)"

    "verdy_p" <verdy underscore p at wanadoo dot fr> wrote:

    >> According to the section "Intellectual Property" in UTN #6, users
    >> must
    >> request a license from IBM to implement BOCU-1. Can you point me to a
    >> passage somewhere that grants a "free license for free use" without
    >> requesting permission from IBM?
    >
    > They are not required to get a licence if they use ICU to support
    > BOCU-1 with full compliance, which is legally licenced in an open way
    > that does not require a personal permission by IBM: IBM has already
    > given us this permission in ICU, accepting its own open licencing
    > terms.

    I don't count it as a free license for free use if I have to use a
    certain vendor's tool, no matter how wonderful it is -- especially if
    that tool has licensing terms, no matter how liberal they are. This
    might be fine for some technologies and file formats, but we are talking
    about a *character encoding*, for heaven's sake. I should be able to
    write my own implementation in 6502 assembly code if I want to.

    > I have NOT said that BOCU (without the "-1" suffix) is open/free:

    I know you haven't. It is patented, and because of that, profiles of
    BOCU such as BOCU-1 are patented too. But then, Marcia Courtemanche
    already told us that.

    > As a consequence, it's impossible to adapt BOCU to make it conforming
    > to ISO 8859 requirements, or even to ISO 646 requirements, or just to
    > filesystem naming requirements (slashes, dots, or ASCII letter case
    > folding), without asking for such a permission. (It's possible to do
    > that with BOCU with such licence, but completely impossible with
    > BOCU-1 without breaking it).

    One of the claims in the BOCU patent is "[t]he method... wherein the
    characters requiring higher code point numbers [than U+0020] are Greek."
    I take that to mean that ASCII opacity is part of the nature of BOCU.

    > The patent is however highly questionable: it attempts to cover cases
    > that are already free since long (notably it covers all numeric bases,
    > not just the base-243 used in BOCU-1): it could as well cover Base64
    > or Hexadecimal or Base85 of PostScript, or the encoding used in
    > Punycode! The principles of decomposition of numbers in a numeric
    > base, and the principles of representing non-decimal digits with a
    > single octet mapped differently from the numeric value of the digit,
    > is used since very long. This is also true with the variable-length
    > encoding of string lengths (just using bit pattern prefixes here, for
    > Huffmann coding using predermined statistics).

    Makes you wonder what sort of research is being done by USPTO.

    > May be the only difference with other algorithms is that BOCU uses two
    > distinct mappings from digits (whose values are all those of a single
    > based positional numeric system) into byte values : one subset of byte
    > values (alphabet) for the remaining lower bits only in the prefix byte
    > (to encode the most significant digit), and another (larger) alphabet
    > for the remaining digits.

    I'm not sure what this means, but all multiple-byte character encodings
    have different ranges for lead bytes and trail bytes. Self-delimiting
    numeric values use a different range for the last byte of the sequence.
    So this idea isn't novel either.

    I'd be surprised to see any real-world text encoded in BOCU-1, not only
    because it's probably the world's only IP-encumbered character encoding,
    but because it has been stigmatized by the HTML 5 Working Draft
    <http://www.w3.org/TR/html5/>, which actually *forbids* conformant user
    agents from recognizing it (along with CESU-8 and UTF-7 and SCSU).

    --
    Doug Ewell  |  Thornton, Colorado, USA  |  http://www.ewellic.org
    RFC 5645, 4645, UTN #14  |  ietf-languages @ http://is.gd/2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Thu Dec 17 2009 - 23:28:32 CST