Re: Proposing UTF-21/24

From: Doug Ewell (
Date: Mon Jan 22 2007 - 01:06:55 CST

  • Next message: Alan Wood: "RE: Hanunoo font?"

    Ruszlan Gaszanov <ruszlan at ather dot net> wrote:

    > Well, SCSU and BOCU are too complex to be considered plain text
    > encodings, and do not provide significant advantages comparing to
    > general-purpose compression formats, while being much more
    > specialized. Therefore, their usability is questionable.

    SCSU and BOCU(-1) are most certainly plain-text encodings. Complexity
    does not disqualify them from that role, any more than it does for
    UTF-7. Their "specialization" is in representing Unicode text; they are
    relatively unsuitable for representing arbitrary integer values. I
    don't see how this makes them less useful for their intended purpose.

    The greatest roadblock to acceptance of SCSU is its *perception* of
    complexity. It is not nearly as complicated as it is perceived to be,
    and I say this having implemented both simple and optimized encoders as
    well as decoders. Algorithms like MD5 and Punycode and gzip are quite a
    bit more complex, yet you don't hear anyone complaining that gzip should
    not be used because it's too complex.

    BOCU-1 is less complex, but more obscure, than SCSU, but it has an
    additional problem: its core algorithm is covered under a U.S. patent
    (6,737,994) owned by IBM. Although they currently offer a royalty-free
    license, IBM has been known to change their terms of licensing from time
    to time. I've personally stayed away from BOCU-1 since the patent was
    disclosed -- memories of the Unisys GIF patent are still too fresh in my

    Saying that compression formats do not provide advantages over
    general-purpose compression turns out to be like saying "History shows
    that..." It's not that simple. There are certain GP formats that are
    relatively sensitive to the format of input data, and others that are
    not. It seems hard to justify playing the card that GP compression is
    more efficient, and at the same time playing the card that
    compression-oriented encodings (which are much less complex than GP
    compression) are too complex.

    Doug Ewell  *  Fullerton, California, USA  *  RFC 4645  *  UTN #14

    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 01:09:28 CST