From: Asmus Freytag (firstname.lastname@example.org)
Date: Mon Jan 22 2007 - 14:37:41 CST
On 1/22/2007 11:54 AM, Ruszlan Gaszanov wrote:
> Doug Ewell wrote:
>> SCSU and BOCU(-1) are most certainly plain-text encodings. Complexity
>> does not disqualify them from that role, any more than it does for
>> UTF-7. Their "specialization" is in representing Unicode text; they are
>> relatively unsuitable for representing arbitrary integer values. I
>> don't see how this makes them less useful for their intended purpose.
> Well, for short texts the additional computational cost of using SCSU and BOCU-1 is hardly justified.
The interesting use-scenario for code-level compression like SCSU and
BOCU is the case of *lots* of little strings. SCSU was conceived of in
the context of transmission of short, independent packets that could not
be bundled for compression and so missed a design feature needed for use
in databases, which BOCU addresses. Too bad that we didn't get this
right the first time - BOCU is now saddled with patent issues and SCSU
and BOCU must now compete for the same (small) corner of the universe.
> For longer texts, I personally do not see any significant advantages of using those formats instead of GP compression.
Neither does anyone else - however, depending on the nature of your GP
compression, SCSU + GP will be smaller than GP alone. (We've covered
that on this list before, look for SCSU and LZW in the archives.)
> On the other hand, many GP compression formats provide additional features, such as integrity checks and encryption. Also, since most popular GP compressions are already widely supported, application developers apparently do not see the need to implement specialized compression for plain text.
Precisely, SCSU, even though it has clear advantages in the 'many small
strings' case, has the same problem of being "yet anohter format" that
all the proposed UTF's have and will have. The cost of adding yet
another arrow to *all* quivers is higher than the benefit of a slightly
better arrow for a particular case.
This has nothing to do with complexity.
This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 14:39:12 CST