Re: Plane 14 tags and SCSU

From: Mark Davis ([email protected])
Date: Sun Jul 02 2000 - 15:27:39 EDT

Next message: John Hudson: "Re: The real problem?"
Previous message: [email protected]: "The real problem?"
Maybe in reply to: Doug Ewell: "Plane 14 tags and SCSU"
Next in thread: Asmus Freytag: "Re: Plane 14 tags and SCSU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

In Asmus's defense, there are fewer recipients that will understand SCSU right now, so one needs to be a bit more carefull about slinging it around. On the other hand, for anything outside of plain English, it is quite a handy mechanism for interchanging Unicode text, so it can reduce memory consumption and/or transmission time where both sender and recipient understand it.

I should also mention that there is open-source code for SCSU available from IBM in ICU (C and Java):

In C:
http://oss.software.ibm.com/icu/apiref/API1.5/API/scsu.h.html
http://oss.software.ibm.com/developerworks/opensource/cvs/icu/source/common/
look at scsu.c
http://oss.software.ibm.com/developerworks/opensource/cvs/icu/source/common/unicode/
look at scsu.h

In Java:
http://oss.software.ibm.com/developerworks/opensource/cvs/icu4j/icu4j/src/com/ibm/text/
look at SCSU.java, UnicodeCompressor.java, and UnicodeDecompressor.java

Mark

Doug Ewell wrote:

> Asmus Freytag <[email protected]> wrote:
>
> > Unlike Plane 14, SCSU is not necessarily intended for unfettered
> > public interchange as if it was YAUTF (yet another utf). Yes, it can
> > be nice and small, but it assumes that the recipient have a conformant
> > decoder and can reliably detect when to invoke it.
>
> Isn't this true for UTF-8, UTF-16, and any other encoding form or TES
> of Unicode? Receivers of data in these formats also must be able to
> detect them and interpret them. How does SCSU differ in this regard?
>
> The SCSU technical report defines clearly what a conformant decoder must
> be able to do, and suggests a header sequence which would make auto-
> detection a relatively simple and accurate job (not many files start
> with the bytes 0E FE FF).
>
> Writing a conformant SCSU decoder turns out to be a rather straight-
> forward job, not that much more work than writing a *good* UTF-8 decoder
> (with checking for illegal and irregular sequences).
>
> -Doug Ewell
> Fullerton, California

Next message: John Hudson: "Re: The real problem?"
Previous message: [email protected]: "The real problem?"
Maybe in reply to: Doug Ewell: "Plane 14 tags and SCSU"
Next in thread: Asmus Freytag: "Re: Plane 14 tags and SCSU"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:05 EDT