In Asmus's defense, there are fewer recipients that will understand SCSU right now, so one needs to be a bit more carefull about slinging it around. On the other hand, for anything outside of plain English, it is quite a handy mechanism for interchanging Unicode text, so it can reduce memory consumption and/or transmission time where both sender and recipient understand it.
I should also mention that there is open-source code for SCSU available from IBM in ICU (C and Java):
In C:
http://oss.software.ibm.com/icu/apiref/API1.5/API/scsu.h.html
http://oss.software.ibm.com/developerworks/opensource/cvs/icu/source/common/
look at scsu.c
http://oss.software.ibm.com/developerworks/opensource/cvs/icu/source/common/unicode/
look at scsu.h
In Java:
http://oss.software.ibm.com/developerworks/opensource/cvs/icu4j/icu4j/src/com/ibm/text/
look at SCSU.java, UnicodeCompressor.java, and UnicodeDecompressor.java
Mark
Doug Ewell wrote:
> Asmus Freytag <asmusf@ix.netcom.com> wrote:
>
> > Unlike Plane 14, SCSU is not necessarily intended for unfettered
> > public interchange as if it was YAUTF (yet another utf). Yes, it can
> > be nice and small, but it assumes that the recipient have a conformant
> > decoder and can reliably detect when to invoke it.
>
> Isn't this true for UTF-8, UTF-16, and any other encoding form or TES
> of Unicode? Receivers of data in these formats also must be able to
> detect them and interpret them. How does SCSU differ in this regard?
>
> The SCSU technical report defines clearly what a conformant decoder must
> be able to do, and suggests a header sequence which would make auto-
> detection a relatively simple and accurate job (not many files start
> with the bytes 0E FE FF).
>
> Writing a conformant SCSU decoder turns out to be a rather straight-
> forward job, not that much more work than writing a *good* UTF-8 decoder
> (with checking for illegal and irregular sequences).
>
> -Doug Ewell
> Fullerton, California
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT