Re: Medievalist ligature character in the PUA

From: verdy_p (
Date: Thu Dec 17 2009 - 09:44:05 CST

  • Next message: Doug Ewell: "BOCU patent (was: Re: Medievalist ligature character in the PUA)"

    "Doug Ewell"
    > "verdy_p" wrote:
    > > But a variant based on the more general BOCU algorithm (patented by
    > > IBM and not usable without requesting a licence to IBM, with the only
    > > exception of BOCU-1 for which there's a free licence for free use as
    > > long as the implementation is conforming *strictly* to its
    > > specification and then disallows all extensions or variants)
    > According to the section "Intellectual Property" in UTN #6, users must
    > request a license from IBM to implement BOCU-1. Can you point me to a
    > passage somewhere that grants a "free license for free use" without
    > requesting permission from IBM?

    They are not required to get a licence if they use ICU to support BOCU-1 with full compliance, which is legally
    licenced in an open way that does not require a personal permission by IBM: IBM has already given us this permission
    in ICU, accepting its own open licencing terms.

    This also includes the possibility to just use a part of ICU (which is highly modular) and modifying its code (as
    long as it remains fully compliant to the restricted BOCU-1 specification) and redistributing the modified code (for
    example porting it to another language or OS), provided that this modification and redistribution complies to the
    ICU licencing terms (which includes the need to keep the copyright notice and inform about the ICU licence, and
    providing a copy of this licence). In other words, you can safely optimize it for your needs and make any adaptation
    that is needed for making this code run successfully as expected.

    I have NOT said that BOCU (without the "-1" suffix) is open/free: it is efectively patented, and this means that you
    are NOT allowed to modify the implementation of BOCU-1 in a way that would not conform to its specification (this
    means, among other requirements, that you cannot add other reserved bytes than those 12 controls to make sure they
    will represent controls encoded on single bytes, you cannot preserve the full US-ASCII range, you MUST use and
    accept all the valid BOCU-1 byte values for properly encoding Unicode code points, and you MUST not encode surrogate

    As a consequence, it's impossible to adapt BOCU to make it conforming to ISO 8859 requirements, or even to ISO 646
    requirements, or just to filesystem naming requirements (slashes, dots, or ASCII letter case folding), without
    asking for such a permission. (It's possible to do that with BOCU with such licence, but completely impossible with
    BOCU-1 without breaking it).

    The patent is however highly questionable: it attempts to cover cases that are already free since long (notably it
    covers all numeric bases, not just the base-243 used in BOCU-1): it could as well cover Base64 or Hexadecimal or
    Base85 of PostScript, or the encoding used in Punycode! The principles of decomposition of numbers in a numeric
    base, and the principles of representing non-decimal digits with a single octet mapped differently from the numeric
    value of the digit, is used since very long. This is also true with the variable-length encoding of string lengths
    (just using bit pattern prefixes here, for Huffmann coding using predermined statistics).

    I just wonder which part of BOCU is really patented: it is only patented when used as a whole, but excluding all the
    many prior arts that it includes (including public domain arts like Huffmann coding and UUCode and various flavors
    of Base64 and the generic principles of TLV encoding, or arts convered by other prior patents not owned by IBM such
    as Base85 in Adobe Postscript).

    May be the only difference with other algorithms is that BOCU uses two distinct mappings from digits (whose values
    are all those of a single based positional numeric system) into byte values : one subset of byte values (alphabet)
    for the remaining lower bits only in the prefix byte (to encode the most significant digit), and another (larger)
    alphabet for the remaining digits.

    But there are other things that BOCU does not cover: the use of multiple bases (for example bases and subbases, like
    in the notation of time using various period units and subunits)


    This archive was generated by hypermail 2.1.5 : Thu Dec 17 2009 - 09:46:21 CST