Doug Ewell <email@example.com> sayeth:
> I have been studying Technical Report #6 on the Standard Compression
> Scheme for Unicode*, and I am running into a problem that perhaps
> one of the gurus on this list can explain for me. In the example
> for Russian, the compressed data begins with an SC7 tag (0x17),
> which maps the subsequent characters 0x80 through 0xFF into the
> default position of (dynamic) window 7, as the accompanying text
> points out. However, according to Table X-5, the default offset for
> window 7 is 0xFF00. Window 2, on the other hand, does default to
> offset 0x0400 and would seem to be the correct window for Cyrillic
> (and is identified as such in the table). The proper tag would then
> be SC2 (0x12). Am I missing something, or is there an error in the
> technical report?
I have complained about this and other contradictions in tr6.html
earlier, see http://czyborra.com/scsu/errata.mbox.gz for details.
Try http://czyborra.com/zcat.cgi/scsu/errata.mbox.gz if you do not
have gzip <http://www.gzip.org/>.
> * What's wrong with the shorter and more straightforward "Standard
> Unicode Compression Scheme," anyway? Someone got a problem with
> the abbreviation?
Yes, "SUCS" sucks. The "UCS" part could easily be misunderstood as
standing for the Universal Character Set and that's not what it is
about: SCSU is a character encoding scheme and no new coded character
set. The SCSU only has the UCS as its underlying CCS. It is no
"SCSU" has been around for more than a year. The Reuters predecessor
was also called "RCSU". The sample implementation calls itself
"SCSU". I have requested IANA charset registration for that label. We
haven't had any complaints from some South Carolina State University
owning the "SCSU" trademark. So let us please stick to "SCSU".
This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:42 EDT