RE: Proposing UTF-21/24

From: Ruszlan Gaszanov (
Date: Mon Jan 22 2007 - 13:54:39 CST

  • Next message: Kenneth Whistler: "Re: Proposing UTF-21/24"

    Doug Ewell wrote:

    > SCSU and BOCU(-1) are most certainly plain-text encodings. Complexity
    > does not disqualify them from that role, any more than it does for
    > UTF-7. Their "specialization" is in representing Unicode text; they are
    > relatively unsuitable for representing arbitrary integer values. I
    > don't see how this makes them less useful for their intended purpose.

    Well, for short texts the additional computational cost of using SCSU and BOCU-1 is hardly justified. For longer texts, I personally do not see any significant advantages of using those formats instead of GP compression. On the other hand, many GP compression formats provide additional features, such as integrity checks and encryption. Also, since most popular GP compressions are already widely supported, application developers apparently do not see the need to implement specialized compression for plain text.

    > Saying that compression formats do not provide advantages over
    > general-purpose compression turns out to be like saying "History shows
    > that..." It's not that simple. There are certain GP formats that are
    > relatively sensitive to the format of input data, and others that are
    > not.

    I'm not saying that specialized compression does not provide advantages over GP compression in general. For some types of data (like graphics, video and audio) specialized compression formats are indeed very very useful and perform much better then GP. I just don't see too much advantages of using specialized compression in case of plain text.

    > It seems hard to justify playing the card that GP compression is
    > more efficient, and at the same time playing the card that
    > compression-oriented encodings (which are much less complex than GP
    > compression) are too complex.

    What I meant is that SCSU and BOCU-1 are too complex for *uncompressed* formats and not efficient enough for *compressed* formats. They are sort of trying to fit in both categories, but don't fit too well in either.


    This archive was generated by hypermail 2.1.5 : Mon Jan 22 2007 - 13:56:13 CST