Re: Proposing UTF-21/24

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Jan 22 2007 - 14:37:41 CST

  • Next message: Ruszlan Gaszanov: "RE: Regulating PUA."

    On 1/22/2007 11:54 AM, Ruszlan Gaszanov wrote:
    > Doug Ewell wrote:
    >
    >
    >> SCSU and BOCU(-1) are most certainly plain-text encodings. Complexity
    >> does not disqualify them from that role, any more than it does for
    >> UTF-7. Their "specialization" is in representing Unicode text; they are
    >> relatively unsuitable for representing arbitrary integer values. I
    >> don't see how this makes them less useful for their intended purpose.
    >>
    >
    > Well, for short texts the additional computational cost of using SCSU and BOCU-1 is hardly justified.
    The interesting use-scenario for code-level compression like SCSU and
    BOCU is the case of *lots* of little strings. SCSU was conceived of in
    the context of transmission of short, independent packets that could not
    be bundled for compression and so missed a design feature needed for use
    in databases, which BOCU addresses. Too bad that we didn't get this
    right the first time - BOCU is now saddled with patent issues and SCSU
    and BOCU must now compete for the same (small) corner of the universe.
    > For longer texts, I personally do not see any significant advantages of using those formats instead of GP compression.
    Neither does anyone else - however, depending on the nature of your GP
    compression, SCSU + GP will be smaller than GP alone. (We've covered
    that on this list before, look for SCSU and LZW in the archives.)
    > On the other hand, many GP compression formats provide additional features, such as integrity checks and encryption. Also, since most popular GP compressions are already widely supported, application developers apparently do not see the need to implement specialized compression for plain text.
    >
    Precisely, SCSU, even though it has clear advantages in the 'many small
    strings' case, has the same problem of being "yet anohter format" that
    all the proposed UTF's have and will have. The cost of adding yet
    another arrow to *all* quivers is higher than the benefit of a slightly
    better arrow for a particular case.

    This has nothing to do with complexity.

    A./



    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 14:39:12 CST